Setting ds_accelerator to cuda (auto detect)
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
warn(msg)
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
warn(msg)
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')}
warn(msg)
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIK1tFOFrWbmoa2ckCJYhzgBHKTSMeR/AeuScCCzugqlI utensilcandel@gmail.com')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so...
Setting ds_accelerator to cuda (auto detect)
INFO:root:loading tokenizer...
Using pad_token, but it is not set yet.
INFO:root:Loading prepared packed dataset from disk at last_run_prepared/21a0611c6c2b67b31f00097fa2a91c26...
INFO:root:Prepared packed dataset loaded from disk...
INFO:root:loading model and peft_config...
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:19<00:00, 9.86s/it]
INFO:root:converting PEFT model w/ prepare_model_for_int8_training
/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/utils/other.py:76: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.
warnings.warn(
INFO:root:found linear modules: ['v_proj', 'k_proj', 'gate_proj', 'q_proj', 'o_proj', 'down_proj', 'up_proj']
trainable params: 159907840 || all params: 3660320768 || trainable%: 4.368683788535114
INFO:root:Compiling torch model
INFO:root:Pre-saving adapter config to ./qlora-out
INFO:root:Starting trainer...
INFO:root:Using Auto-resume functionality to start with checkpoint at qlora-out/checkpoint-130
wandb: Currently logged in as: utensil. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.3
wandb: Run data is saved locally in /workspace/axolotl/wandb/run-20230531_121630-p5lvijpv
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run summer-gorge-6
wandb: ⭐️ View project at https://wandb.ai/utensil/huggyllama-qlora
wandb: 🚀 View run at https://wandb.ai/utensil/huggyllama-qlora/runs/p5lvijpv
{'loss': 0.4474, 'learning_rate': 8.952245334118414e-06, 'epoch': 2.62}
{'loss': 0.4717, 'learning_rate': 8.047222744854943e-06, 'epoch': 2.64}
{'loss': 0.4533, 'learning_rate': 7.1885011480961164e-06, 'epoch': 2.66}
{'loss': 0.4353, 'learning_rate': 6.37651293602628e-06, 'epoch': 2.68}
{'loss': 0.4545, 'learning_rate': 5.611666969163243e-06, 'epoch': 2.7}
90%|████████████████████████████████████▉ | 135/150 [03:37<00:53, 3.56s/it]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████████████████████████████ | 2/3 [00:03<00:01, 1.52s/it]
{'eval_loss': 0.4905628561973572, 'eval_runtime': 6.2813, 'eval_samples_per_second': 1.433, 'eval_steps_per_second': 0.478, 'epoch': 2.7}
90%|████████████████████████████████████▉ | 135/150 [03:44<00:53, 3.56s/it]
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00, 1.00s/it]
{'loss': 0.4715, 'learning_rate': 4.8943483704846475e-06, 'epoch': 2.72}
{'loss': 0.4942, 'learning_rate': 4.224918331506955e-06, 'epoch': 2.74}
{'loss': 0.4535, 'learning_rate': 3.6037139304146762e-06, 'epoch': 2.76}
{'loss': 0.4396, 'learning_rate': 3.0310479623313127e-06, 'epoch': 2.78}
{'loss': 0.4367, 'learning_rate': 2.5072087818176382e-06, 'epoch': 2.8}
93%|██████████████████████████████████████▎ | 140/150 [07:21<02:47, 16.75s/it]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████████████████████████████ | 2/3 [00:03<00:01, 1.52s/it]
{'eval_loss': 0.4902206063270569, 'eval_runtime': 6.2828, 'eval_samples_per_second': 1.432, 'eval_steps_per_second': 0.477, 'epoch': 2.8}
93%|██████████████████████████████████████▎ | 140/150 [07:27<02:47, 16.75s/it]
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00, 1.00s/it]
wandb: Adding directory to artifact (./qlora-out/checkpoint-140)... Done. 8.7s
{'loss': 0.4674, 'learning_rate': 2.032460157676452e-06, 'epoch': 2.82}
{'loss': 0.4613, 'learning_rate': 1.6070411401370334e-06, 'epoch': 2.84}
{'loss': 0.4699, 'learning_rate': 1.231165940486234e-06, 'epoch': 2.86}
{'loss': 0.4253, 'learning_rate': 9.0502382320653e-07, 'epoch': 2.88}
{'loss': 0.436, 'learning_rate': 6.287790106757396e-07, 'epoch': 2.9}
97%|███████████████████████████████████████▋ | 145/150 [11:19<02:57, 35.58s/it]
0%| | 0/3 [00:00<?, ?it/s]
67%|██████████████████████████████ | 2/3 [00:03<00:01, 1.52s/it]
{'eval_loss': 0.4897999167442322, 'eval_runtime': 6.2814, 'eval_samples_per_second': 1.433, 'eval_steps_per_second': 0.478, 'epoch': 2.9}
97%|███████████████████████████████████████▋ | 145/150 [11:25<02:57, 35.58s/it]
100%|█████████████████████████████████████████████| 3/3 [00:03<00:00, 1.00s/it]
{'loss': 0.4491, 'learning_rate': 4.025706004760932e-07, 'epoch': 2.92}
{'loss': 0.4556, 'learning_rate': 2.265124953543918e-07, 'epoch': 2.94}
{'loss': 0.4459, 'learning_rate': 1.0069334586854107e-07, 'epoch': 2.96}
{'loss': 0.448, 'learning_rate': 2.5176505749346936e-08, 'epoch': 2.98}
99%|████████████████████████████████████████▋| 149/150 [14:19<00:41, 41.79s/it]