!pip -q install transformers
!pip -q install datasets loralib sentencepiece
!pip -q install bitsandbytes accelerate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.7/468.7 kB 8.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 45.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110.5/110.5 kB 14.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 67.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.9/132.9 kB 17.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 212.2/212.2 kB 25.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.8/158.8 kB 18.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 264.6/264.6 kB 29.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.2/114.2 kB 14.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 104.3/104.3 MB 16.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 215.3/215.3 kB 24.2 MB/s eta 0:00:00
!nvidia-smi
Thu Apr 20 07:20:34 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 39C P8 9W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")
model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-3b",
load_in_8bit=True,
device_map='auto',)
# model.half().cuda()
Downloading (…)okenizer_config.json: 0%| | 0.00/264 [00:00<?, ?B/s]
Downloading (…)/main/tokenizer.json: 0%| | 0.00/2.11M [00:00<?, ?B/s]
Downloading (…)cial_tokens_map.json: 0%| | 0.00/99.0 [00:00<?, ?B/s]
Downloading (…)lve/main/config.json: 0%| | 0.00/606 [00:00<?, ?B/s]
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Downloading (…)model.bin.index.json: 0%| | 0.00/21.1k [00:00<?, ?B/s]
Downloading shards: 0%| | 0/2 [00:00<?, ?it/s]
Downloading (…)l-00001-of-00002.bin: 0%| | 0.00/10.2G [00:00<?, ?B/s]
Downloading (…)l-00002-of-00002.bin: 0%| | 0.00/4.66G [00:00<?, ?B/s]
===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 7.5 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')} warn(msg) /usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')} warn(msg) /usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-vhm-28s8oeqmskvpx --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')} warn(msg) /usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')} warn(msg) /usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')} warn(msg) /usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg)
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Downloading (…)neration_config.json: 0%| | 0.00/111 [00:00<?, ?B/s]
!nvidia-smi
Thu Apr 20 07:24:05 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 42C P0 27W / 70W | 5233MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
class StopOnTokens(StoppingCriteria):
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = [50278, 50279, 50277, 1, 0]
for stop_id in stop_ids:
if input_ids[0][-1] == stop_id:
return True
return False
system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""
import textwrap
def wrap_text_preserve_newlines(text, width=110):
# Split the input text into lines based on newline characters
lines = text.split('\n')
# Wrap each line individually
wrapped_lines = [textwrap.fill(line, width=width) for line in lines]
# Join the wrapped lines back together using newline characters
wrapped_text = '\n'.join(wrapped_lines)
return wrapped_text
prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.7,
do_sample=True,
top_p=0.95,
stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
# StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. What's your mood today?As an AI language model, I don't have a personal mood. However, I am programmed to provide positive and helpful responses to users' queries. Is there anything specific you would like to know about my mood or any questions you have?
def chat(input_text):
prompt = f"{system_prompt}<|USER|>{input_text}<|ASSISTANT|>"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
top_p=0.95,
do_sample=True,
stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(wrap_text_preserve_newlines(tokenizer.decode(tokens[0], skip_special_tokens=True)))
%%time
chat("Write a note to Sam Altman saying they should open source GPT-4")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
# StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. Write a note to Sam Altman saying they should open source GPT-4Dear Sam Altman, We hope this email finds you well. We are writing to provide you with an update regarding our latest research and development activities. We would like to invite you to join us in our ongoing effort to advance the state of the art in natural language processing. As you are aware, GPT-4 is a state-of-the-art language model that has been designed to achieve outstanding results in various NLP tasks. Our goal is to provide our users with an unprecedented level of accuracy and fluency in natural language text. As you are aware, GPT-4 is not an open-source model, but we believe that it is crucial to the future of NLP research and development. We are committed to being transparent and forthcoming about our proprietary software and how it will be used. We are confident that we can make significant contributions to the state of the art in natural language processing and provide the best possible experience for our users. Please do not hesitate to contact us or to ask questions. Thank you for your support, and we look forward to your continued engagement with us. Best regards, [Your Name] CPU times: user 20.6 s, sys: 0 ns, total: 20.6 s Wall time: 20.5 s
%%time
chat("Write a story about a Koala who could beat all the camelids at pool")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
# StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. Write a story about a Koala who could beat all the camelids at poolOnce upon a time, in a far-off African jungle, there lived a Koala named Kala. She was a small, plump Koala, with a round belly and a cute, curly mop of brown hair. Kala loved to swim and play with her friends in the jungle, but she was too big for most of them. One day, Kala decided to take a swim in a big pool in the jungle. She had never swum in a pool before and was nervous about it. But when she got out of the pool and started to swim, she could feel the water warm on her skin. She swam and swam some more, enjoying the feeling of the cool water on her skin. After a while, Kala realized that she was tired and hungry. She wanted to go back to the jungle and rest for a while, but her friends said that they would take her back if she told them the truth about her size. Kala didn't want to lie, so she told them the truth about her size. The other Koalas in the jungle didn't seem to notice her size at all, but they were all excited to take her back. They brought her back to the jungle and set CPU times: user 21.7 s, sys: 0 ns, total: 21.7 s Wall time: 21.6 s
%%time
chat("What is the capital of England?")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
# StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. What is the capital of England?The capital of England is London. CPU times: user 701 ms, sys: 0 ns, total: 701 ms Wall time: 699 ms
%%time
chat("What are the difference between Llamas, Alpacas and Vicunas?")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
# StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. What are the difference between Llamas, Alpacas and Vicunas?As an AI language model, I don't have personal preferences or opinions, but I can provide you with factual information about these terms. 1. Llamas: Llamas are indigenous people of the Amazon rainforest and have a rich culture and traditions. They are known for their music, dance, and storytelling. Llamas are also famous for their traditional music, dance, and music. 2. Alpacas: Alpacas are a group of indigenous people who live in different regions of the Amazon basin in South America. They are known for their traditional dance and music, and also for their art. 3. Vicunas: Vicunas are a group of indigenous people who live in different regions of the Amazon basin in South America. They are known for their traditional music, dance, and music. In summary, Llamas are indigenous people who live in the Amazon rainforest, Alpacas are indigenous people who live in different regions of the Amazon basin in South America, Vicunas are indigenous people who live in different regions of the Amazon basin in South America. CPU times: user 20.1 s, sys: 0 ns, total: 20.1 s Wall time: 20.1 s
%%time
chat("Are you a fan of the Simpsons? Tell me a bit about Homer.")
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
# StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. Are you a fan of the Simpsons? Tell me a bit about Homer.Of course I am a fan of the show! The Simpson's are my favorite part of the show, and I love the humor and the satire that they bring to the show. I particularly enjoy the character of Bart Simpson, who is a brilliant and humorous character. Homer Simpson is a fictional character that is based on the real-life personality of Homer Simpson, an advertising executive. Homer is a charming and funny person who is always trying to find the best ways to get the best of his clients. He is also a skilled negotiator and is able to negotiate deals that are advantageous to both parties. The Simpsons are a popular American television series, and they are known for their humor and irreverence towards the show's culture and values. The show has been around for over 20 years, and it continues to be entertaining to this day. CPU times: user 15.2 s, sys: 0 ns, total: 15.2 s Wall time: 15.1 s