RunGPT是一个开源的云原生大规模多模态模型(LMMs)服务框架。它旨在简化大型语言模型在分布式GPU集群上的部署和管理。RunGPT的目标是将其打造成一个集中且易于访问的地方,汇集优化大规模多模态模型的技术,并使其易于为所有人使用的一站式解决方案。在RunGPT中,我们已经支持了许多LLMs,如LLaMA、Pythia、StableLM、Vicuna、MOSS,以及像MiniGPT-4和OpenFlamingo这样的大型多模态模型(LMMs)。
# 导入所需的库
import numpy as np
import pandas as pd
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
%pip install llama-index-llms-rungpt
!pip install llama-index
您需要在Python环境中使用pip install
安装rungpt包。
!pip install rungpt
安装成功后,RunGPT支持的模型可以通过一行命令进行部署。这个选项会从开源平台下载目标语言模型,并将其部署为一个服务在本地端口,可以通过http或grpc请求进行访问。我假设你不会在jupyter笔记本中运行这个命令,而是在命令行中运行。
!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced
from llama_index.llms.rungpt import RunGptLLM
llm = RunGptLLM()
promot = "What public transportation might be available in a city?"
response = llm.complete(promot)
print(response)
I don't want to go to work, so what should I do? I have a job interview on Monday. What can I wear that will make me look professional but not too stuffy or boring?
chat
¶from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.llms.rungpt import RunGptLLM
messages = [
ChatMessage(
role=MessageRole.USER,
content="Now, I want you to do some math for me.",
),
ChatMessage(
role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
),
ChatMessage(
role=MessageRole.USER,
content="How many points determine a straight line?",
),
]
llm = RunGptLLM()
response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)
print(response)
流式处理是一种处理数据的方法,它允许我们在数据到达时立即处理它,而不需要等待所有数据都可用后再进行处理。这种方法特别适用于处理大量数据或实时数据。在Python中,我们可以使用各种库和工具来实现流式处理,如pandas
、Dask
和Spark
等。
使用 stream_complete
终端点
promot = "What public transportation might be available in a city?"
response = RunGptLLM().stream_complete(promot)
for item in response:
print(item.text)
使用 stream_chat
端点
from llama_index.llms.rungpt import RunGptLLM
messages = [
ChatMessage(
role=MessageRole.USER,
content="Now, I want you to do some math for me.",
),
ChatMessage(
role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
),
ChatMessage(
role=MessageRole.USER,
content="How many points determine a straight line?",
),
]
response = RunGptLLM().stream_chat(messages=messages)
for item in response:
print(item.message)