Notebook

重写-检索-阅读¶

重写-检索-阅读 是一种在论文Query Rewriting for Retrieval-Augmented Large Language Models中提出的方法。

因为原始查询对于LLM来说并不总是最优的，特别是在现实世界中...我们首先提示LLM重写查询，然后进行检索增强阅读。

我们将展示如何使用LangChain表达语言轻松实现这一点。

基准线¶

可以像下面这样进行基准线的RAG（检索和阅读）：

In [1]:

from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

In [2]:

# 定义模板字符串，用于生成问题回答模板
template = """Answer the users question based only on the following context:

<context>
{context}
</context>

Question: {question}
"""
# 从模板创建对话提示模板
prompt = ChatPromptTemplate.from_template(template)

# 创建OpenAI对话模型，设置温度为0
model = ChatOpenAI(temperature=0)

# 创建DuckDuckGo搜索API包装器
search = DuckDuckGoSearchAPIWrapper()

# 定义检索器函数，接收查询参数并调用搜索API进行查询
def retriever(query):
    return search.run(query)

In [3]:

# 定义一个处理流程链，依次包括retriever、prompt、model和StrOutputParser
chain = (
    {"context": retriever, "question": RunnablePassthrough()}  # 使用retriever获取信息，同时使用RunnablePassthrough处理问题
    | prompt  # 使用prompt处理信息
    | model  # 使用model处理信息
    | StrOutputParser()  # 使用StrOutputParser解析输出结果
)

In [4]:

# 定义一个简单的查询字符串
simple_query = "what is langchain?"

In [5]:

# 调用chain对象的invoke方法，并传入simple_query作为参数
chain.invoke(simple_query)

Out[5]:

"LangChain is a powerful and versatile Python library that enables developers and researchers to create, experiment with, and analyze language models and agents. It simplifies the development of language-based applications by providing a suite of features for artificial general intelligence. It can be used to build chatbots, perform document analysis and summarization, and streamline interaction with various large language model providers. LangChain's unique proposition is its ability to create logical links between one or more language models, known as Chains. It is an open-source library that offers a generic interface to foundation models and allows prompt management and integration with other components and tools."

尽管这对于格式良好的查询来说没问题，但对于更复杂的查询可能会出现问题。

In [6]:

# 定义一个字符串变量 distracted_query，内容为"man that sam bankman fried trial was crazy! what is langchain?"
distracted_query = "man that sam bankman fried trial was crazy! what is langchain?"

In [7]:

# 使用chain对象调用distracted_query方法
chain.invoke(distracted_query)

Out[7]:

'Based on the given context, there is no information provided about "langchain."'

这是因为检索器在这些“分心”查询上表现不佳。

In [8]:

# 调用retriever函数，并传入参数distracted_query
retriever(distracted_query)

Out[8]:

'Business She\'s the star witness against Sam Bankman-Fried. Her testimony was explosive Gary Wang, who co-founded both FTX and Alameda Research, said Bankman-Fried directed him to change a... The Verge, following the trial\'s Oct. 4 kickoff: "Is Sam Bankman-Fried\'s Defense Even Trying to Win?". CBS Moneywatch, from Thursday: "Sam Bankman-Fried\'s Lawyer Struggles to Poke ... Sam Bankman-Fried, FTX\'s founder, responded with a single word: "Oof.". Less than a year later, Mr. Bankman-Fried, 31, is on trial in federal court in Manhattan, fighting criminal charges ... July 19, 2023. A U.S. judge on Wednesday overruled objections by Sam Bankman-Fried\'s lawyers and allowed jurors in the FTX founder\'s fraud trial to see a profane message he sent to a reporter days ... Sam Bankman-Fried, who was once hailed as a virtuoso in cryptocurrency trading, is on trial over the collapse of FTX, the financial exchange he founded. Bankman-Fried is accused of...'

重写-检索-读取实现¶

主要部分是一个重写器，用于重写搜索查询

In [9]:

# 定义一个模板字符串，用于生成搜索引擎查询的问题和答案
template = """Provide a better search query for \
web search engine to answer the given question, end \
the queries with ’**’. Question: \
{x} Answer:"""

# 使用模板字符串创建一个聊天提示模板对象
rewrite_prompt = ChatPromptTemplate.from_template(template)

In [10]:

from langchain import hub

rewrite_prompt = hub.pull("langchain-ai/rewrite")

In [11]:

# 打印rewrite_prompt.template的值
print(rewrite_prompt.template)

Provide a better search query for web search engine to answer the given question, end the queries with ’**’.  Question {x} Answer:

In [12]:

# 解析器，用于移除 `**`

def _parse(text):
    return text.strip('"').strip("**")  # 去除字符串两端的引号和 `**`

In [13]:

# 创建一个重写器，使用rewrite_prompt作为输入，ChatOpenAI(temperature=0)作为中间处理器，StrOutputParser()作为输出解析器，最终调用_parse函数
rewriter = rewrite_prompt | ChatOpenAI(temperature=0) | StrOutputParser() | _parse

In [14]:

# 调用 rewriter 对象的 invoke 方法
# 传入一个字典参数 {"x": distracted_query}
rewriter.invoke({"x": distracted_query})

Out[14]:

'What is the definition and purpose of Langchain?'

In [15]:

# 定义一个重写-检索-读取链
rewrite_retrieve_read_chain = (
    {
        "context": {"x": RunnablePassthrough()} | rewriter | retriever,  # 定义上下文，包括可运行的传递、重写器和检索器
        "question": RunnablePassthrough(),  # 定义问题，使用可运行的传递
    }
    | prompt  # 添加提示
    | model  # 使用模型
    | StrOutputParser()  # 使用字符串输出解析器
)

In [16]:

# 调用 rewrite_retrieve_read_chain 函数，并传入 distracted_query 参数
rewrite_retrieve_read_chain.invoke(distracted_query)

Out[16]:

'Based on the given context, LangChain is an open-source framework designed to simplify the creation of applications using large language models (LLMs). It enables LLM models to generate responses based on up-to-date online information and simplifies the organization of large volumes of data for easy access by LLMs. LangChain offers a standard interface for chains, integrations with other tools, and end-to-end chains for common applications. It is a robust library that streamlines interaction with various LLM providers. LangChain\'s unique proposition is its ability to create logical links between one or more LLMs, known as Chains. It is an AI framework with features that simplify the development of language-based applications and offers a suite of features for artificial general intelligence. However, the context does not provide any information about the "sam bankman fried trial" mentioned in the question.'

In [ ]:

重写-检索-阅读¶

基准线¶

重写-检索-读取 实现¶

重写-检索-读取实现¶