Notebook

Hugging Face 语言模型¶

有许多种方法可以与Hugging Face的语言模型进行交互。 Hugging Face本身提供了几个Python包来实现访问， LlamaIndex将这些包装成了LLM实体：

transformers 包：使用 llama_index.llms.HuggingFaceLLM
Hugging Face 推理 API, 由 huggingface_hub[inference] 包装：使用 llama_index.llms.HuggingFaceInferenceAPI

这两者有非常多的可能组合方式，因此本笔记本仅详细介绍了一些。让我们以Hugging Face的文本生成任务作为示例。

在下面的代码中，我们安装了这个演示所需的包：

transformers[torch] 是为了 HuggingFaceLLM
huggingface_hub[inference] 是为了 HuggingFaceInferenceAPI
引号是为了 Z shell (zsh)

In [ ]:

%pip install llama-index-llms-huggingface

In [ ]:

!pip install "transformers[torch]" "huggingface_hub[inference]"

现在我们已经准备好了，让我们开始玩一下吧：

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

!pip install llama-index

In [ ]:

import osfrom typing import List, Optionalfrom llama_index.llms.huggingface import (    HuggingFaceInferenceAPI,    HuggingFaceLLM,)# 参考：https://huggingface.co/docs/hub/security-tokens# 我们只需要一个具有读取权限的令牌来进行演示HF_TOKEN: Optional[str] = os.getenv("HUGGING_FACE_TOKEN")# 注意：当这个令牌在HuggingFaceInferenceAPI中被使用时，None默认将回退到Hugging Face的令牌存储中。

In [ ]:

# 这里使用了 https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha# 如果是第一次调用，则会下载到本地的 Hugging Face 模型缓存中，# 然后在本地机器上运行模型locally_run = HuggingFaceLLM(model_name="HuggingFaceH4/zephyr-7b-alpha")# 这将使用相同的模型，但在 Hugging Face 的服务器上远程运行，# 通过 Hugging Face 推理 API 访问# 请注意，使用您的令牌不会产生费用，# 推理 API 是免费的，只是有速率限制remotely_run = HuggingFaceInferenceAPI(    model_name="HuggingFaceH4/zephyr-7b-alpha", token=HF_TOKEN)# 或者您可以跳过提供令牌，匿名使用 Hugging Face 推理 APIremotely_run_anon = HuggingFaceInferenceAPI(    model_name="HuggingFaceH4/zephyr-7b-alpha")# 如果您没有向 HuggingFaceInferenceAPI 提供 model_name，# 则会使用 Hugging Face 推荐的模型（感谢 huggingface_hub）remotely_run_recommended = HuggingFaceInferenceAPI(token=HF_TOKEN)

使用HuggingFaceInferenceAPI完成的基础是Hugging Face的文本生成任务。

In [ ]:

completion_response = remotely_run_recommended.complete("To infinity, and")
print(completion_response)

 beyond!
The Infinity Wall Clock is a unique and stylish way to keep track of time. The clock is made of a durable, high-quality plastic and features a bright LED display. The Infinity Wall Clock is powered by batteries and can be mounted on any wall. It is a great addition to any home or office.

如果您修改了LLM，还应该相应地修改全局的分词器！

In [ ]:

from llama_index.core import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha").encode
)

如果你感兴趣，其他Hugging Face推理API任务包括：

llama_index.llms.HuggingFaceInferenceAPI.chat：对话任务
llama_index.embeddings.HuggingFaceInferenceAPIEmbedding：特征提取任务

是的，Hugging Face嵌入模型支持以下内容：

transformers[torch]：由HuggingFaceEmbedding包装
huggingface_hub[inference]：由HuggingFaceInferenceAPIEmbedding包装

上述两个都是llama_index.embeddings.base.BaseEmbedding的子类。

使用Hugging Face的`text-generation-inference`¶

新的TextGenerationInference类允许与运行text-generation-inference, TGI的端点进行交互。除了快速的推理之外，它还支持从版本2.0.1开始的tool使用。

要初始化TextGenerationInference的实例，您需要提供端点URL（TGI的自托管实例或在Hugging Face上创建的公共推理端点）。对于私有推理端点，需要提供您的HF令牌（可以作为初始化参数或环境变量）。

In [ ]:

# 导入必要的库import osfrom typing import List, Optionalfrom llama_index.llms.huggingface import (    TextGenerationInference,)# 定义URL地址URL = "your_tgi_endpoint"model = TextGenerationInference(    model_url=URL, token=False)  # 如果是公共端点，请将token设置为False# 调用模型生成文本completion_response = model.complete("To infinity, and")print(completion_response)

 beyond! This phrase is a reference to the famous line from the movie "Toy Story" when Buzz Lightyear, a toy astronaut, exclaims "To infinity and beyond!" as he soars through space. It has since become a catchphrase for reaching for the stars and striving for greatness. However, if you meant to ask a mathematical question, "To infinity" refers to a very large, infinite number, and "and beyond" could be interpreted as continuing infinitely in a certain direction. For example, "2 to the power of infinity" would represent a very large, infinite number.

要使用TextGenerationInference工具，您可以使用已经存在的工具，也可以自定义一个：

In [ ]:

from typing import List, Literalfrom llama_index.core.bridge.pydantic import BaseModel, Fieldfrom llama_index.core.tools import FunctionToolfrom llama_index.core.base.llms.types import (    ChatMessage,    MessageRole,)def get_current_weather(location: str, format: str):    """获取当前天气    Args:    location (str): 城市和州，例如：旧金山，加利福尼亚    format (str): 要使用的温度单位（'celsius' 或 'fahrenheit'）。从用户位置推断出来。    """    ...class WeatherArgs(BaseModel):    location: str = Field(        description="城市和地区，例如：巴黎，法兰西岛"    )    format: Literal["fahrenheit", "celsius"] = Field(        description="要使用的温度单位（'fahrenheit' 或 'celsius'）。从位置推断出来。",    )weather_tool = FunctionTool.from_defaults(    fn=get_current_weather,    name="get_current_weather",    description="获取当前天气",    fn_schema=WeatherArgs,)def get_current_weather_n_days(location: str, format: str, num_days: int):    """获取未来N天的天气预报    Args:    location (str): 城市和州，例如：旧金山，加利福尼亚    format (str): 要使用的温度单位（'celsius' 或 'fahrenheit'）。从用户位置推断出来。    num_days (int): 天气预报的天数。    """    ...class ForecastArgs(BaseModel):    location: str = Field(        description="城市和地区，例如：巴黎，法兰西岛"    )    format: Literal["fahrenheit", "celsius"] = Field(        description="要使用的温度单位（'fahrenheit' 或 'celsius'）。从位置推断出来。",    )    num_days: int = Field(        description="天气预报的持续时间（天）。",    )forecast_tool = FunctionTool.from_defaults(    fn=get_current_weather_n_days,    name="get_current_weather_n_days",    description="获取未来N天的当前天气",    fn_schema=ForecastArgs,)usr_msg = ChatMessage(    role=MessageRole.USER,    content="巴黎未来一周的天气如何？",)response = model.chat_with_tools(    user_msg=usr_msg,    tools=[        weather_tool,        forecast_tool,    ],    tool_choice="get_current_weather_n_days",)print(response.message.additional_kwargs)

{'tool_calls': [{'id': 0, 'type': 'function', 'function': {'description': None, 'name': 'get_current_weather_n_days', 'arguments': {'format': 'celsius', 'location': 'Paris, Ile-de-France', 'num_days': 7}}}]}

Hugging Face 语言模型¶

使用Hugging Face的text-generation-inference¶

使用Hugging Face的`text-generation-inference`¶