Notebook

In [ ]:

!git clone https://github.com/BlinkDL/RWKV-LM.git
#!wget https://huggingface.co/BlinkDL/rwkv-4-pile-3b/resolve/main/RWKV-4-Pile-3B-20220915-1207.pth -O ./RWKV-LM/RWKV-v4/500.pth 3B needs more vram then google offers
!wget https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/resolve/main/RWKV-4-Pile-1B5-20220903-8040.pth -O ./RWKV-LM/RWKV-v4/500.pth

Cloning into 'RWKV-LM'...
remote: Enumerating objects: 1092, done.
remote: Counting objects: 100% (211/211), done.
remote: Compressing objects: 100% (81/81), done.
remote: Total 1092 (delta 132), reused 185 (delta 130), pack-reused 881
Receiving objects: 100% (1092/1092), 5.97 MiB | 30.87 MiB/s, done.
Resolving deltas: 100% (663/663), done.
--2022-09-21 06:57:37--  https://huggingface.co/BlinkDL/rwkv-4-pile-1b5/resolve/main/RWKV-4-Pile-1B5-20220903-8040.pth
Resolving huggingface.co (huggingface.co)... 54.173.5.192, 44.195.102.200, 52.5.62.33, ...
Connecting to huggingface.co (huggingface.co)|54.173.5.192|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/d6/95/d69583b06567422d104d5413e7926ae97bcf0d541619db6e61fe10133d91582d/4e215be3b4f86dc2f145835b47a2c432306c373cbf625375b7721bb474512bad?response-content-disposition=attachment%3B%20filename%3D%22RWKV-4-Pile-1B5-20220903-8040.pth%22 [following]
--2022-09-21 06:57:37--  https://cdn-lfs.huggingface.co/repos/d6/95/d69583b06567422d104d5413e7926ae97bcf0d541619db6e61fe10133d91582d/4e215be3b4f86dc2f145835b47a2c432306c373cbf625375b7721bb474512bad?response-content-disposition=attachment%3B%20filename%3D%22RWKV-4-Pile-1B5-20220903-8040.pth%22
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 13.226.52.13, 13.226.52.128, 13.226.52.14, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|13.226.52.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3030279587 (2.8G) [application/zip]
Saving to: ‘./RWKV-LM/RWKV-v4/500.pth’

./RWKV-LM/RWKV-v4/5 100%[===================>]   2.82G  81.0MB/s    in 35s     

2022-09-21 06:58:12 (82.9 MB/s) - ‘./RWKV-LM/RWKV-v4/500.pth’ saved [3030279587/3030279587]

In [ ]:

%cd ./RWKV-LM/RWKV-v4/

/content/RWKV-LM/RWKV-v4

In [ ]:

!pip install transformers
!pip install ninja

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.22.1-py3-none-any.whl (4.9 MB)
     |████████████████████████████████| 4.9 MB 4.0 MB/s 
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (6.0)
Collecting huggingface-hub<1.0,>=0.9.0
  Downloading huggingface_hub-0.9.1-py3-none-any.whl (120 kB)
     |████████████████████████████████| 120 kB 66.4 MB/s 
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.8.0)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (21.3)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers) (4.12.0)
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
     |████████████████████████████████| 6.6 MB 48.2 MB/s 
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.21.6)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.64.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2022.6.2)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface-hub<1.0,>=0.9.0->transformers) (4.1.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers) (3.0.9)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers) (3.8.1)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2022.6.15)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.9.1 tokenizers-0.12.1 transformers-4.22.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ninja
  Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
     |████████████████████████████████| 108 kB 4.0 MB/s 
Installing collected packages: ninja
Successfully installed ninja-1.10.2.3

In [ ]:

########################################################################################################
# The RWKV Language Model - https://github.com/BlinkDL/RWKV-LM
########################################################################################################
import numpy as np
import math, os
import time
import types
import copy
import torch
from torch.nn import functional as F
from src.utils import TOKENIZER, Dataset
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.allow_tf32 = True
torch.backends.cuda.matmul.allow_tf32 = True
np.set_printoptions(precision=4, suppress=True, linewidth=200)

########################################################################################################
# Step 1: set model
# 
# Set TOKEN_MODE to 'char' or 'bpe' if the model is trained by 'train.py' from scratch.
#
# Set TOKEN_MODE to 'pile' if you want to test pre-trained pile models.
########################################################################################################

TOKEN_MODE = 'pile' # char / bpe / pile

n_layer = 6
n_embd = 512
ctx_len = 10024

if TOKEN_MODE == 'char':
    MODEL_NAME = 'trained-500'  # your trained model
    WORD_NAME = 'vocab'         # the .json vocab (generated by train.py)
    # set UNKNOWN_CHAR to the rarest token in your vocab.json, and all unknown tokens in your prompt will be denoted by it
    UNKNOWN_CHAR = ' '          # here we just set it to ' ' for simplicity

elif TOKEN_MODE == 'bpe':
    MODEL_NAME = 'trained-500'  # your trained model
    WORD_NAME = ['model-vocab.json', 'model-merges.txt'] # [vocab, merge] for your BPE model
    UNKNOWN_CHAR = None

elif TOKEN_MODE == 'pile':
    WORD_NAME = ['20B_tokenizer.json', '20B_tokenizer.json']
    UNKNOWN_CHAR = None

    #---> you can set MODEL_NAME to your fine-tuned model <---

    MODEL_NAME = '500'
   
    # for 3b
    #n_layer = 32
    #n_embd = 2560
    #ctx_len = 10024

    # for 1b5'
    n_layer = 24
    n_embd = 2048
    ctx_len = 1024

os.environ['RWKV_FLOAT_MODE'] = 'bf16'  # 'bf16' / 'fp16' / 'fp32' (note: only using fp32 at this moment)
os.environ['RWKV_RUN_DEVICE'] = 'cuda'   # 'cpu' (already very fast) or 'cuda'
model_type = 'RWKV' # 'RWKV' or 'RWKV-ffnPre'

########################################################################################################
# Step 2: set prompt & sampling stuffs
########################################################################################################

# context = 'A'
# context = "\nIn the"
# context = '\nSugar:'

NUM_TRIALS = 5
LENGTH_PER_TRIAL = 3330

DEBUG_DEBUG = False  # True False --> show softmax output

########################################################################################################

print(f'Loading {MODEL_NAME}...')
from src.model_run import RWKV_RNN
model = RWKV_RNN(MODEL_NAME, os.environ['RWKV_RUN_DEVICE'], model_type, n_layer, n_embd, ctx_len)
tokenizer = TOKENIZER(WORD_NAME, UNKNOWN_CHAR=UNKNOWN_CHAR)

Loading 500...

RWKV_HEAD_QK_DIM 0

Using /root/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py37_cu113/wkv...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py37_cu113/wkv/build.ninja...
Building extension module wkv...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Loading extension module wkv...

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.

Moving 0 files to the new cache system

0it [00:00, ?it/s]

In [ ]:

!nvidia-smi

Wed Sep 21 06:59:29 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P0    28W /  70W |   9282MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

In [ ]:

########################################################################################################
context = "You are an AI running a house.\ngiven the following commands: volumeUp(int amount),volumeDown(int amount),setVolume(int percent),setLights([r,g,b]),playSong(string url)\nand the given instruction 'Please make the room romantic'\nList the commands, and the parameters they should have, that should be done to fullfil the command\nGive the commands in the format [command(parameter)]\n\nTask: list the commands and a reasonable value for the parameter\nResponse:"

TEMPERATURE = 0.9
top_p = 0.8
top_p_newline = 0.9 # only used in TOKEN_MODE = char

if tokenizer.charMode:
    context = tokenizer.refine_context(context)
    ctx = [tokenizer.stoi.get(s, tokenizer.UNKNOWN_CHAR) for s in context]
else:
    ctx = tokenizer.tokenizer.encode(context)
src_len = len(ctx)
src_ctx = ctx.copy()

print('\nYour prompt has ' + str(src_len) + ' tokens.')
print('\n--> Currently the first run takes a while if your prompt is long, as we are using RNN to process the prompt. Use GPT to build the hidden state for better speed. <--\n')

for TRIAL in range(1 if DEBUG_DEBUG else NUM_TRIALS):
    t_begin = time.time_ns()
    print(('-' * 30) + context, end='')
    ctx = src_ctx.copy()
    model.clear()
    if TRIAL == 0:
        init_state = types.SimpleNamespace()
        for i in range(src_len):
            x = ctx[:i+1]
            if i == src_len - 1:
                init_state.out = model.run(x)
            else:
                model.run(x)
        model.save(init_state)
    else:
        model.load(init_state)

    for i in range(src_len, src_len + (1 if DEBUG_DEBUG else LENGTH_PER_TRIAL)):
        x = ctx[:i+1]
        x = x[-ctx_len:]

        if i == src_len:
            out = copy.deepcopy(init_state.out)
        else:
            out = model.run(x)
        if DEBUG_DEBUG:
            print('model', np.array(x), '==>', np.array(
                out), np.max(out), np.min(out))

        if TOKEN_MODE == 'pile':
            out[0] = -999999999  # disable <|endoftext|>

        char = tokenizer.sample_logits(out, x, ctx_len, temperature=TEMPERATURE,
                                       top_p_usual=top_p, top_p_newline=top_p_newline)
        char = char.item()
        if tokenizer.charMode:
            print(tokenizer.itos[int(char)], end='', flush=True)
        else:
            print(tokenizer.tokenizer.decode(int(char)), end='', flush=True)
        ctx += [char]

    t_end = time.time_ns()
    print("\n----------", round((t_end - t_begin) / (10 ** 9), 2), end='s ')

Your prompt has 110 tokens.

--> Currently the first run takes a while if your prompt is long, as we are using RNN to process the prompt. Use GPT to build the hidden state for better speed. <--

------------------------------You are an AI running a house.
given the following commands: volumeUp(int amount),volumeDown(int amount),setVolume(int percent),setLights([r,g,b]),playSong(string url)
and the given instruction 'Please make the room romantic'
List the commands, and the parameters they should have, that should be done to fullfil the command
Give the commands in the format [command(parameter)]

Task: list the commands and a reasonable value for the parameter
Response:

answers:
-  answer 1:
-  answer 2:
-  answer 3:
-  answer 4:
-  answer 5:

The first two commands (room.increase volume and command.setVolume) work.

But the others do not. The command "volumeUp(int percent)" seems to do nothing.
I have tried to change the command by "volumeUp(1)" but the result is still the same.

What command should I use to set the volume to full?

I have an idea about what happens in the first command, but I don't know how it works.
This is the full code of the game:
#include <iostream>
#include <vector>
#include <list>
#include <string>
#include <

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-7-ce677f0ef442> in <module>
     41             out = copy.deepcopy(init_state.out)
     42         else:
---> 43             out = model.run(x)
     44         if DEBUG_DEBUG:
     45             print('model', np.array(x), '==>', np.array(

/content/RWKV-LM/RWKV-v4/src/model_run.py in run(self, ctx)
    365             else:
    366                 x = x + self.SA(self.LN(x, w.blocks[i].ln1), w.blocks[i].att, f'att.{i}')
--> 367             x = x + self.FF(self.LN(x, w.blocks[i].ln2), w.blocks[i].ffn, f'ffn.{i}')
    368 
    369         x = self.LN(x, w.ln_out)

/content/RWKV-LM/RWKV-v4/src/model_run.py in LN(self, xx, w)
    301 
    302     def LN(self, xx, w):
--> 303         return F.layer_norm(xx, (self.n_embd,), weight=w.weight, bias=w.bias)
    304 
    305     def FF(self, xx, w, name):

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in layer_norm(input, normalized_shape, weight, bias, eps)
   2501             layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2502         )
-> 2503     return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
   2504 
   2505 

KeyboardInterrupt: