This notebook makes it easy to try out finetuning falcon-1b with Axolotl+QLo on RunPod.
If you run into any issues, welcome to report here .
To run this notebook on RunPod, use this RunPod template to deploy a pod with GPU.
%cd axolotl
/workspace/axolotl
!accelerate config --config_file configs/accelerate/default_config.yaml default
Setting ds_accelerator to cuda (auto detect) accelerate configuration saved at /root/.cache/huggingface/accelerate/default_config.yaml
!wget https://raw.githubusercontent.com/utensil/axolotl/falcon-7b-qlora/examples/falcon/config-7b-qlora.yml
--2023-06-05 14:10:49-- https://raw.githubusercontent.com/utensil/axolotl/falcon-7b-qlora/examples/falcon/config-7b-qlora.yml Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 2238 (2.2K) [text/plain] Saving to: ‘config-7b-qlora.yml’ config-7b-qlora.yml 100%[===================>] 2.19K --.-KB/s in 0s 2023-06-05 14:10:49 (30.8 MB/s) - ‘config-7b-qlora.yml’ saved [2238/2238]
!cp config-7b-qlora.yml config-1b-qlora.yml
!sed -i -e 's/falcon-7b/falcon-rw-1b/g' -e 's/wandb_project: falcon-qlora/wandb_project: /g' config-1b-qlora.yml
!cat config-1b-qlora.yml
# 1b: tiiuae/falcon-rw-1b # 40b: tiiuae/falcon-40b base_model: tiiuae/falcon-rw-1b base_model_config: tiiuae/falcon-rw-1b # required by falcon custom model code: https://huggingface.co/tiiuae/falcon-rw-1b/tree/main trust_remote_code: true model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false # enable 4bit for QLoRA load_in_4bit: true gptq: false strict: false push_dataset_to_hub: datasets: - path: QingyiSi/Alpaca-CoT data_files: - Chain-of-Thought/formatted_cot_data/gsm8k_train.json type: "alpaca:chat" dataset_prepared_path: last_run_prepared val_set_size: 0.01 # enable QLoRA adapter: qlora lora_model_dir: sequence_len: 2048 max_packed_sequence_len: # hyperparameters from QLoRA paper Appendix B.2 # "We find hyperparameters to be largely robust across datasets" lora_r: 64 lora_alpha: 16 # 0.1 for models up to 13B # 0.05 for 33B and 65B models lora_dropout: 0.05 # add LoRA modules on all linear layers of the base model lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: wandb_project: wandb_watch: wandb_run_id: wandb_log_model: output_dir: ./qlora-out # QLoRA paper Table 9 # - 16 for 7b & 13b # - 32 for 33b, 64 for 64b # Max size tested on A6000 # - 7b: 40 # - 40b: 4 # decrease if OOM, increase for max VRAM utilization micro_batch_size: 1 gradient_accumulation_steps: 2 num_epochs: 3 # Optimizer for QLoRA optimizer: paged_adamw_32bit torchdistx_path: lr_scheduler: cosine # QLoRA paper Table 9 # - 2e-4 for 7b & 13b # - 1e-4 for 33b & 64b learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: true gradient_checkpointing: true # stop training after this many evaluation losses have increased in a row # https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback early_stopping_patience: 3 resume_from_checkpoint: auto_resume_from_checkpoints: true local_rank: logging_steps: 1 xformers_attention: true flash_attention: gptq_groupsize: gptq_model_v1: warmup_steps: 10 eval_steps: 5 save_steps: 10 debug: deepspeed: weight_decay: 0.000001 fsdp: fsdp_config: special_tokens: pad_token: "<|endoftext|>" bos_token: ">>ABSTRACT<<" eos_token: "<|endoftext|>"
%env WANDB_MODE=offline
env: WANDB_MODE=offline
This is to skip some extra setup steps, you can also choose to login to your W&B account before training.
!accelerate launch scripts/finetune.py config-1b-qlora.yml
Setting ds_accelerator to cuda (auto detect) ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')} warn(msg) /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')} warn(msg) /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDvnE4umHheXhWsDJbbukYvvyc47/mC4z8syS93btA72T90WDrQagOy5O+DrhdXOvr5i/JwsTlAImy57eLRrtRFOrQq73jyi7Dzo0tvrAiNLVgX2q2dFLoplRyXDXiVYLPmPieMWQOeUCLeSb8FC5zzllcocZwjMXpxScDerZqnlAR0ccpSkGyKIod4ZMkn/29A/C5kHEb/wT8cOAq+MWJ/2okZZgbiR0AMV4DynAkrtcx9JnJnTs9chiMyH+dyCS42Ai24sHWJBkQo6TfxXkyKo9GOpu3Y2WLgrHyaot9Lk5mA1mujyIWdlReD2nvjeCQKjl3KW3xZ73m4nD97MydWSWoJfEWlr+VZvk8EWsZk3CYLZCIBLdod6xXJJ0DD0pvTIq11c8VB7XkgVjapuU/sC8M6HFzHW/NBeE+xX/txPkZkIGqrnxeQ0AtBXdN9ukyNGhGzTkPYJNliiYpY0dCvVuz/BJ2FawFTQGnD1EHOenUCRajREFGCbKoYZqi40j8= utensil@Utensils-MacBook-Pro.local')} warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so... Setting ds_accelerator to cuda (auto detect) WARNING:root:`trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model. INFO:root:loading tokenizer... tiiuae/falcon-rw-1b Downloading (…)okenizer_config.json: 100%|█████| 255/255 [00:00<00:00, 48.1kB/s] Downloading (…)olve/main/vocab.json: 100%|███| 798k/798k [00:00<00:00, 54.2MB/s] Downloading (…)olve/main/merges.txt: 100%|███| 456k/456k [00:00<00:00, 11.4MB/s] Downloading (…)/main/tokenizer.json: 100%|█| 2.11M/2.11M [00:00<00:00, 51.1MB/s] Downloading (…)cial_tokens_map.json: 100%|████| 99.0/99.0 [00:00<00:00, 200kB/s] Using pad_token, but it is not set yet. INFO:root:Unable to find prepared dataset in last_run_prepared/0ecc5b78e3ce4254b22e749b093712b4 INFO:root:Loading raw datasets... Downloading readme: 100%|██████████████████| 8.26k/8.26k [00:00<00:00, 23.9MB/s] Downloading and preparing dataset json/QingyiSi--Alpaca-CoT to /root/.cache/huggingface/datasets/QingyiSi___json/QingyiSi--Alpaca-CoT-2953efcfeb19f105/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4... Downloading data files: 0%| | 0/1 [00:00<?, ?it/s] Downloading data: 0%| | 0.00/4.45M [00:00<?, ?B/s] Downloading data: 100%|████████████████████| 4.45M/4.45M [00:00<00:00, 25.2MB/s] Downloading data files: 100%|█████████████████████| 1/1 [00:00<00:00, 2.18it/s] Extracting data files: 100%|████████████████████| 1/1 [00:00<00:00, 2031.14it/s] Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/QingyiSi___json/QingyiSi--Alpaca-CoT-2953efcfeb19f105/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data. 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 830.88it/s] INFO:root:tokenizing, merging, and shuffling master dataset INFO:root:Saving merged prepared dataset to disk... last_run_prepared/0ecc5b78e3ce4254b22e749b093712b4 INFO:root:loading model and peft_config... Downloading (…)lve/main/config.json: 100%|██████| 665/665 [00:00<00:00, 452kB/s] Downloading (…)/configuration_RW.py: 100%|█| 2.61k/2.61k [00:00<00:00, 3.79MB/s] A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-rw-1b: - configuration_RW.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. Downloading (…)main/modelling_RW.py: 100%|█| 47.5k/47.5k [00:00<00:00, 48.6MB/s] A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-rw-1b: - modelling_RW.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. Downloading pytorch_model.bin: 100%|████████| 2.62G/2.62G [00:23<00:00, 113MB/s] Downloading (…)neration_config.json: 100%|█████| 111/111 [00:00<00:00, 28.3kB/s] INFO:root:converting PEFT model w/ prepare_model_for_int8_training /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/utils/other.py:76: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( INFO:root:found linear modules: ['dense_h_to_4h', 'dense', 'dense_4h_to_h', 'query_key_value'] trainable params: 50331648 || all params: 757911552 || trainable%: 6.6408339953630895 INFO:root:Compiling torch model INFO:root:Pre-saving adapter config to ./qlora-out INFO:root:Starting trainer... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) wandb: Tracking run with wandb version 0.15.3 wandb: W&B syncing is set to `offline` in this directory. wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing. 0%| | 0/11097 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. {'loss': 1.8589, 'learning_rate': 2e-05, 'epoch': 0.0} {'loss': 1.6525, 'learning_rate': 4e-05, 'epoch': 0.0} {'loss': 1.6585, 'learning_rate': 6e-05, 'epoch': 0.0} {'loss': 1.8112, 'learning_rate': 8e-05, 'epoch': 0.0} {'loss': 1.7954, 'learning_rate': 0.0001, 'epoch': 0.0} 0%| | 5/11097 [00:02<1:25:34, 2.16it/s] 0%| | 0/75 [00:00<?, ?it/s] 97%|█████████████████████████████████████████▊ | 73/75 [00:03<00:00, 19.84it/s] {'eval_loss': 1.9517160654067993, 'eval_runtime': 3.8321, 'eval_samples_per_second': 19.572, 'eval_steps_per_second': 19.572, 'epoch': 0.0}