# 安装 langchain 和 langchain-google-vertexai 包
!pip install --upgrade langchain langchain-google-vertexai
请前往Google Cloud上的VertexAI模型花园控制台,并将所需版本的Gemma部署到VertexAI。这将需要几分钟时间,待端点准备就绪后,您需要复制其编号。
# @title 基本参数
project: str = "PUT_YOUR_PROJECT_ID_HERE" # @param {type:"string"} # 项目ID
endpoint_id: str = "PUT_YOUR_ENDPOINT_ID_HERE" # @param {type:"string"} # 终端ID
location: str = "PUT_YOUR_ENDPOINT_LOCAtION_HERE" # @param {type:"string"} # 终端位置
from langchain_google_vertexai import (
GemmaChatVertexAIModelGarden, # 导入GemmaChatVertexAIModelGarden模块
GemmaVertexAIModelGarden, # 导入GemmaVertexAIModelGarden模块
)
2024-02-27 17:15:10.457149: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-02-27 17:15:10.508925: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-27 17:15:10.508957: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-27 17:15:10.510289: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-27 17:15:10.518898: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
# 创建一个名为llm的GemmaVertexAIModelGarden对象
llm = GemmaVertexAIModelGarden(
endpoint_id=endpoint_id, # 设置endpoint_id参数
project=project, # 设置project参数
location=location, # 设置location参数
)
# 调用llm对象的invoke方法,传入"What is the meaning of life?"作为参数,并将返回值赋给output变量
output = llm.invoke("What is the meaning of life?")
# 打印output变量的值
print(output)
Prompt: What is the meaning of life? Output: Who am I? Why do I exist? These are questions I have struggled with
我们还可以将Gemma用作多轮对话模型:
from langchain_core.messages import HumanMessage
#创建一个GemmaChatVertexAIModelGarden对象,传入参数endpoint_id、project和location
llm = GemmaChatVertexAIModelGarden(
endpoint_id=endpoint_id,
project=project,
location=location,
)
message1 = HumanMessage(content="How much is 2+2?")
answer1 = llm.invoke([message1])
print(answer1)
message2 = HumanMessage(content="How much is 3+3?")
answer2 = llm.invoke([message1, answer1, message2])
print(answer2)
content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\n8-years old.<end_of_turn>\n\n<start_of' content='Prompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nPrompt:\n<start_of_turn>user\nHow much is 2+2?<end_of_turn>\n<start_of_turn>model\nOutput:\n8-years old.<end_of_turn>\n\n<start_of<end_of_turn>\n<start_of_turn>user\nHow much is 3+3?<end_of_turn>\n<start_of_turn>model\nOutput:\nOutput:\n3-years old.<end_of_turn>\n\n<'
您可以对响应进行后处理以避免重复。
# 调用llm的invoke方法,传入message1作为参数,并将返回结果解析为True
answer1 = llm.invoke([message1], parse_response=True)
# 打印answer1的值
print(answer1)
# 调用llm的invoke方法,传入message1、answer1和message2作为参数,并将返回结果解析为True
answer2 = llm.invoke([message1, answer1, message2], parse_response=True)
# 打印answer2的值
print(answer2)
content='Output:\n<<humming>>: 2+2 = 4.\n<end' content='Output:\nOutput:\n<<humming>>: 3+3 = 6.'
为了在本地运行Gemma,您可以首先从Kaggle下载它。为了做到这一点,您需要登录Kaggle平台,创建一个API密钥并下载一个kaggle.json
。了解更多关于Kaggle身份验证的信息,请点击这里。
# 创建一个名为.kaggle的文件夹,并将kaggle.json文件复制到该文件夹中
!mkdir -p ~/.kaggle && cp kaggle.json ~/.kaggle/kaggle.json
/opt/conda/lib/python3.10/pty.py:89: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. pid, fd = os.forkpty()
# 安装 keras_nlp 库
!pip install keras>=3 keras_nlp
/opt/conda/lib/python3.10/pty.py:89: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. pid, fd = os.forkpty()
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorstore 0.1.54 requires ml-dtypes>=0.3.1, but you have ml-dtypes 0.2.0 which is incompatible.
# 导入 langchain_google_vertexai 模块中的 GemmaLocalKaggle 类
from langchain_google_vertexai import GemmaLocalKaggle
2024-02-27 16:38:40.797559: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-02-27 16:38:40.848444: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-27 16:38:40.848478: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-27 16:38:40.849728: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-27 16:38:40.857936: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
你可以指定keras后端(默认为tensorflow
,但你可以将其更改为jax
或torch
)。
# @title 基本参数
keras_backend: str = "jax" # @param {type:"string"} # 定义keras后端为"jax"
model_name: str = "gemma_2b_en" # @param {type:"string"} # 定义模型名称为"gemma_2b_en"
# 导入GemmaLocalKaggle类
from GemmaLocalKaggle import GemmaLocalKaggle
# 创建GemmaLocalKaggle对象,并传入model_name和keras_backend参数
llm = GemmaLocalKaggle(model_name=model_name, keras_backend=keras_backend)
2024-02-27 16:23:14.661164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20549 MB memory: -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9 normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
# 调用llm对象的invoke方法,传入参数"What is the meaning of life?"和max_tokens=30
output = llm.invoke("What is the meaning of life?", max_tokens=30)
# 打印输出结果
print(output)
W0000 00:00:1709051129.518076 774855 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
What is the meaning of life? The question is one of the most important questions in the world. It’s the question that has
同上,使用Gemma作为本地多轮对话模型。您可能需要重新启动笔记本并清理GPU内存,以避免OOM错误:
# 导入 langchain_google_vertexai 模块中的 GemmaChatLocalKaggle 类
from langchain_google_vertexai import GemmaChatLocalKaggle
2024-02-27 16:58:22.331067: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-02-27 16:58:22.382948: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-27 16:58:22.382978: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-27 16:58:22.384312: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-27 16:58:22.392767: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
# @title 基本参数
keras_backend: str = "jax" # @param {type:"string"} # 定义keras后端为"jax"
model_name: str = "gemma_2b_en" # @param {type:"string"} # 定义模型名称为"gemma_2b_en"
# 导入GemmaChatLocalKaggle类
from GemmaChatLocalKaggle import GemmaChatLocalKaggle
# 创建一个GemmaChatLocalKaggle对象,并传入model_name和keras_backend参数
llm = GemmaChatLocalKaggle(model_name=model_name, keras_backend=keras_backend)
2024-02-27 16:58:29.001922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20549 MB memory: -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9 normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
从langchain_core.messages导入HumanMessage类
# 创建一个HumanMessage对象,内容为"Hi! Who are you?"
message1 = HumanMessage(content="Hi! Who are you?")
# 调用llm的invoke方法,传入一个包含message1的列表和最大token数为30
answer1 = llm.invoke([message1], max_tokens=30)
# 打印answer1的结果
print(answer1)
2024-02-27 16:58:49.848412: I external/local_xla/xla/service/service.cc:168] XLA service 0x55adc0cf2c10 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2024-02-27 16:58:49.848458: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA L4, Compute Capability 8.9 2024-02-27 16:58:50.116614: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 2024-02-27 16:58:54.389324: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8900 WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1709053145.225207 784891 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process. W0000 00:00:1709053145.284227 784891 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model."
# 创建一个包含人类消息内容的对象
message2 = HumanMessage(content="What can you help me with?")
# 调用llm模型生成回复
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60)
# 打印生成的回复
print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n Tampoco\nI'm a model.<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model"
你可以对响应进行后处理,如果你想避免多轮对话的陈述。
# 调用llm的invoke方法,传入参数[message1],设置max_tokens为30,parse_response为True,并将返回值赋给answer1变量
answer1 = llm.invoke([message1], max_tokens=30, parse_response=True)
# 打印answer1的值
print(answer1)
# 调用llm的invoke方法,传入参数[message1, answer1, message2],设置max_tokens为60,parse_response为True,并将返回值赋给answer2变量
answer2 = llm.invoke([message1, answer1, message2], max_tokens=60, parse_response=True)
# 打印answer2的值
print(answer2)
content="I'm a model.\n Tampoco\nI'm a model." content='I can help you with your modeling.\n Tampoco\nI can'
Gemma是一个由HuggingFace开发的自然语言处理工具。以下是在本地环境中运行Gemma的步骤:
# 导入 langchain_google_vertexai 模块中的 GemmaChatLocalHF 和 GemmaLocalHF 类
from langchain_google_vertexai import GemmaChatLocalHF, GemmaLocalHF
2024-02-27 17:02:21.832409: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-02-27 17:02:21.883625: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-27 17:02:21.883656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-27 17:02:21.884987: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-27 17:02:21.893340: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
# @title 基本参数
hf_access_token: str = "PUT_YOUR_TOKEN_HERE" # @param {type:"string"} # 需要将你的访问令牌放在这里,作为字符串类型的参数
model_name: str = "google/gemma-2b" # @param {type:"string"} # 需要将你的模型名称放在这里,作为字符串类型的参数
# 创建一个GemmaLocalHF对象,并传入model_name和hf_access_token参数
llm = GemmaLocalHF(model_name="google/gemma-2b", hf_access_token=hf_access_token)
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
# 调用llm模块的invoke函数,传入参数"What is the meaning of life?"和max_tokens=50
output = llm.invoke("What is the meaning of life?", max_tokens=50)
What is the meaning of life? The question is one of the most important questions in the world. It’s the question that has been asked by philosophers, theologians, and scientists for centuries. And it’s the question that
同上,使用Gemma作为本地多轮对话模型。您可能需要重新启动笔记本并清理GPU内存,以避免OOM错误:
# 创建一个 GemmaChatLocalHF 对象,并传入 model_name 和 hf_access_token 参数
llm = GemmaChatLocalHF(model_name=model_name, hf_access_token=hf_access_token)
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
from langchain_core.messages import HumanMessage
message1 = HumanMessage(content="Hi! Who are you?")
answer1 = llm.invoke([message1], max_tokens=60)
print(answer1)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean"
# 创建一个HumanMessage对象,内容为"What can you help me with?"
message2 = HumanMessage(content="What can you help me with?")
# 调用llm模型,传入message1、answer1和message2作为输入,生成的文本最大长度为140
answer2 = llm.invoke([message1, answer1, message2], max_tokens=140)
# 打印生成的文本
print(answer2)
content="<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\n<start_of_turn>user\nHi! Who are you?<end_of_turn>\n<start_of_turn>model\nI'm a model.\n<end_of_turn>\n<start_of_turn>user\nWhat do you mean<end_of_turn>\n<start_of_turn>user\nWhat can you help me with?<end_of_turn>\n<start_of_turn>model\nI can help you with anything.\n<"
同样的,也适用于后处理:
# 调用llm的invoke函数,传入参数[message1],设置max_tokens为60,parse_response为True,并将结果赋值给answer1
answer1 = llm.invoke([message1], max_tokens=60, parse_response=True)
# 打印answer1的结果
print(answer1)
# 调用llm的invoke函数,传入参数[message1, answer1, message2],设置max_tokens为120,parse_response为True,并将结果赋值给answer2
answer2 = llm.invoke([message1, answer1, message2], max_tokens=120, parse_response=True)
# 打印answer2的结果
print(answer2)
content="I'm a model.\n<end_of_turn>\n" content='I can help you with anything.\n<end_of_turn>\n<end_of_turn>\n'