Notebook

In [ ]:

!pip -q install transformers
!pip -q install datasets loralib sentencepiece 
!pip -q install bitsandbytes accelerate

     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.7/468.7 kB 8.2 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 45.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 110.5/110.5 kB 14.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 67.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.9/132.9 kB 17.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 212.2/212.2 kB 25.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 158.8/158.8 kB 18.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 264.6/264.6 kB 29.9 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 114.2/114.2 kB 14.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 104.3/104.3 MB 16.4 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 215.3/215.3 kB 24.2 MB/s eta 0:00:00

In [ ]:

!nvidia-smi

Thu Apr 20 07:20:34 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

In [ ]:

import torch 
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList

In [ ]:

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-tuned-alpha-3b")

model = AutoModelForCausalLM.from_pretrained("stabilityai/stablelm-tuned-alpha-3b",
                                                 load_in_8bit=True,
                                                device_map='auto',)
# model.half().cuda()

Downloading (…)okenizer_config.json:   0%|          | 0.00/264 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/606 [00:00<?, ?B/s]

Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.

Downloading (…)model.bin.index.json:   0%|          | 0.00/21.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00002.bin:   0%|          | 0.00/10.2G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00002.bin:   0%|          | 0.00/4.66G [00:00<?, ?B/s]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...

/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('http'), PosixPath('//172.28.0.1'), PosixPath('8013')}
  warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-t4-vhm-28s8oeqmskvpx --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}
  warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
  warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

In [ ]:

!nvidia-smi

Thu Apr 20 07:24:05 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    27W /  70W |   5233MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

In [ ]:

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        stop_ids = [50278, 50279, 50277, 1, 0]
        for stop_id in stop_ids:
            if input_ids[0][-1] == stop_id:
                return True
        return False

system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

In [ ]:

import textwrap

def wrap_text_preserve_newlines(text, width=110):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text

In [ ]:

prompt = f"{system_prompt}<|USER|>What's your mood today?<|ASSISTANT|>"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
tokens = model.generate(
  **inputs,
  max_new_tokens=64,
  temperature=0.7,
  do_sample=True,
  top_p=0.95,
  stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
What's your mood today?As an AI language model, I don't have a personal mood. However, I am programmed to provide positive and helpful responses to users' queries. Is there anything specific you would like to know about my mood or any questions you have?

In [ ]:

def chat(input_text):
    
    prompt = f"{system_prompt}<|USER|>{input_text}<|ASSISTANT|>"

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    tokens = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.95,
    do_sample=True,
    stopping_criteria=StoppingCriteriaList([StopOnTokens()])
    )
    print(wrap_text_preserve_newlines(tokenizer.decode(tokens[0], skip_special_tokens=True)))

In [ ]:

%%time
chat("Write a note to Sam Altman saying they should open source GPT-4")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

We hope this email finds you well. We are writing to provide you with an update regarding our latest research
and development activities. We would like to invite you to join us in our ongoing effort to advance the state
of the art in natural language processing.

As you are aware, GPT-4 is a state-of-the-art language model that has been designed to achieve outstanding
results in various NLP tasks. Our goal is to provide our users with an unprecedented level of accuracy and
fluency in natural language text.

As you are aware, GPT-4 is not an open-source model, but we believe that it is crucial to the future of NLP
research and development. We are committed to being transparent and forthcoming about our proprietary software
and how it will be used.

We are confident that we can make significant contributions to the state of the art in natural language
processing and provide the best possible experience for our users. Please do not hesitate to contact us or to
ask questions.

Thank you for your support, and we look forward to your continued engagement with us.

Best regards,

[Your Name]
CPU times: user 20.6 s, sys: 0 ns, total: 20.6 s
Wall time: 20.5 s

In [ ]:

%%time
chat("Write a story about a Koala who could beat all the camelids at pool")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered
harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and
make jokes.
- StableLM will refuse to participate in anything that could harm a human.
Write a story about a Koala who could beat all the camelids at poolOnce upon a time, in a far-off African
jungle, there lived a Koala named Kala. She was a small, plump Koala, with a round belly and a cute, curly mop
of brown hair. Kala loved to swim and play with her friends in the jungle, but she was too big for most of
them.

One day, Kala decided to take a swim in a big pool in the jungle. She had never swum in a pool before and was
nervous about it. But when she got out of the pool and started to swim, she could feel the water warm on her
skin. She swam and swam some more, enjoying the feeling of the cool water on her skin.

After a while, Kala realized that she was tired and hungry. She wanted to go back to the jungle and rest for a
while, but her friends said that they would take her back if she told them the truth about her size. Kala
didn't want to lie, so she told them the truth about her size.

The other Koalas in the jungle didn't seem to notice her size at all, but they were all excited to take her
back. They brought her back to the jungle and set
CPU times: user 21.7 s, sys: 0 ns, total: 21.7 s
Wall time: 21.6 s

In [ ]:

%%time
chat("What is the capital of England?")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered
harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and
make jokes.
- StableLM will refuse to participate in anything that could harm a human.
What is the capital of England?The capital of England is London.
CPU times: user 701 ms, sys: 0 ns, total: 701 ms
Wall time: 699 ms

In [ ]:

%%time
chat("What are the difference between Llamas, Alpacas and Vicunas?")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered
harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and
make jokes.
- StableLM will refuse to participate in anything that could harm a human.
What are the difference between Llamas, Alpacas and Vicunas?As an AI language model, I don't have personal
preferences or opinions, but I can provide you with factual information about these terms.

1. Llamas:
Llamas are indigenous people of the Amazon rainforest and have a rich culture and traditions. They are known
for their music, dance, and storytelling. Llamas are also famous for their traditional music, dance, and
music.

2. Alpacas:
Alpacas are a group of indigenous people who live in different regions of the Amazon basin in South America.
They are known for their traditional dance and music, and also for their art.

3. Vicunas:
Vicunas are a group of indigenous people who live in different regions of the Amazon basin in South America.
They are known for their traditional music, dance, and music.

In summary, Llamas are indigenous people who live in the Amazon rainforest, Alpacas are indigenous people who
live in different regions of the Amazon basin in South America, Vicunas are indigenous people who live in
different regions of the Amazon basin in South America.
CPU times: user 20.1 s, sys: 0 ns, total: 20.1 s
Wall time: 20.1 s

In [ ]:

%%time
chat("Are you a fan of the Simpsons? Tell me a bit about Homer.")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.

# StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered
harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and
make jokes.
- StableLM will refuse to participate in anything that could harm a human.
Are you a fan of the Simpsons? Tell me a bit about Homer.Of course I am a fan of the show! The Simpson's are
my favorite part of the show, and I love the humor and the satire that they bring to the show. I particularly
enjoy the character of Bart Simpson, who is a brilliant and humorous character.

Homer Simpson is a fictional character that is based on the real-life personality of Homer Simpson, an
advertising executive. Homer is a charming and funny person who is always trying to find the best ways to get
the best of his clients. He is also a skilled negotiator and is able to negotiate deals that are advantageous
to both parties.

The Simpsons are a popular American television series, and they are known for their humor and irreverence
towards the show's culture and values. The show has been around for over 20 years, and it continues to be
entertaining to this day.
CPU times: user 15.2 s, sys: 0 ns, total: 15.2 s
Wall time: 15.1 s

In [ ]: