Notebook

使用OpenAI函数结构化回答¶

OpenAI函数允许对回答输出进行结构化处理。这在问答中经常很有用，因为你不仅想要得到最终答案，还想要支持证据、引用等。

在这个笔记本中，我们展示了如何使用LLM链，该链使用OpenAI函数作为整个检索流程的一部分。

In [25]:

from langchain.chains import RetrievalQA
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [26]:

# 创建一个文本加载器对象，加载指定路径下的文本文件
loader = TextLoader("../../state_of_the_union.txt", encoding="utf-8")
# 调用加载器的load方法，将文本文件加载为文档列表
documents = loader.load()
# 创建一个字符文本分割器对象，设置每个分割块的大小为1000个字符，重叠部分为0
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# 使用文本分割器对文档列表进行分割，得到分割后的文本列表
texts = text_splitter.split_documents(documents)
# 遍历分割后的文本列表，为每个文本添加元数据"source"，值为"{i}-pl"，其中i为索引
for i, text in enumerate(texts):
    text.metadata["source"] = f"{i}-pl"
# 创建一个OpenAIEmbeddings对象，用于生成文本的嵌入表示
embeddings = OpenAIEmbeddings()
# 使用Chroma.from_documents方法，基于文本列表和嵌入表示生成文档搜索对象
docsearch = Chroma.from_documents(texts, embeddings)

In [27]:

# 导入所需的模块
from langchain.chains import create_qa_with_sources_chain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

In [28]:

# 创建一个ChatOpenAI对象，设置temperature为0，模型为"gpt-3.5-turbo-0613"
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

In [29]:

# 创建一个包含源链的问答链
qa_chain = create_qa_with_sources_chain(llm)

In [30]:

# 创建一个PromptTemplate对象，用于生成文档提示
doc_prompt = PromptTemplate(
    template="Content: {page_content}\nSource: {source}",  # 设置模板，包含两个变量page_content和source
    input_variables=["page_content", "source"],  # 定义模板中需要的输入变量
)

In [31]:

# 创建一个StuffDocumentsChain对象，并传入参数
final_qa_chain = StuffDocumentsChain(
    llm_chain=qa_chain,  # 使用qa_chain作为llm_chain参数
    document_variable_name="context",  # 设置document_variable_name参数为"context"
    document_prompt=doc_prompt,  # 使用doc_prompt作为document_prompt参数
)

In [32]:

retrieval_qa = RetrievalQA(
    retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain
)
# 创建一个 RetrievalQA 对象，用于检索问答任务
# retriever 参数指定了检索器，这里使用了 docsearch.as_retriever() 方法返回的检索器
# combine_documents_chain 参数指定了文档合并链，这里使用了 final_qa_chain 变量的值作为合并链

In [33]:

query = "总统对俄罗斯说了什么"

In [34]:

# 运行retrieval_qa模块中的run函数，并传入query作为参数
retrieval_qa.run(query)

Out[34]:

'{\n  "answer": "The President expressed strong condemnation of Russia\'s actions in Ukraine and announced measures to isolate Russia and provide support to Ukraine. He stated that Russia\'s invasion of Ukraine will have long-term consequences for Russia and emphasized the commitment to defend NATO countries. The President also mentioned taking robust action through sanctions and releasing oil reserves to mitigate gas prices. Overall, the President conveyed a message of solidarity with Ukraine and determination to protect American interests.",\n  "sources": ["0-pl", "4-pl", "5-pl", "6-pl"]\n}'

使用 Pydantic¶

如果我们想要的话，我们可以在 Pydantic 中设置链条返回。请注意，如果下游链条消耗此链条的输出（包括内存），它们通常会期望它以字符串格式呈现，因此您应该仅在它是最终链条时使用此链条。

In [35]:

# 创建一个具有Pydantic输出解析器的QA链
qa_chain_pydantic = create_qa_with_sources_chain(llm, output_parser="pydantic")

In [36]:

# 创建一个名为final_qa_chain_pydantic的StuffDocumentsChain对象
final_qa_chain_pydantic = StuffDocumentsChain(
    llm_chain=qa_chain_pydantic,  # 使用qa_chain_pydantic作为llm_chain参数
    document_variable_name="context",  # 将"context"作为document_variable_name参数
    document_prompt=doc_prompt,  # 使用doc_prompt作为document_prompt参数
)

In [37]:

# 导入RetrievalQA类
retrieval_qa_pydantic = RetrievalQA(
    # 使用docsearch作为检索器
    retriever=docsearch.as_retriever(),
    # 使用final_qa_chain_pydantic作为文档组合链
    combine_documents_chain=final_qa_chain_pydantic
)

In [38]:

# 运行query查询
retrieval_qa_pydantic.run(query)

Out[38]:

AnswerWithSources(answer="The President expressed strong condemnation of Russia's actions in Ukraine and announced measures to isolate Russia and provide support to Ukraine. He stated that Russia's invasion of Ukraine will have long-term consequences for Russia and emphasized the commitment to defend NATO countries. The President also mentioned taking robust action through sanctions and releasing oil reserves to mitigate gas prices. Overall, the President conveyed a message of solidarity with Ukraine and determination to protect American interests.", sources=['0-pl', '4-pl', '5-pl', '6-pl'])

在 ConversationalRetrievalChain 中的使用¶

我们还可以展示在 ConversationalRetrievalChain 中使用它的情况。请注意，由于这个链条涉及到内存，我们将不使用 Pydantic 的返回类型。

In [39]:

from langchain.chains import ConversationalRetrievalChain, LLMChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\
Make sure to avoid using any unclear pronouns.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
condense_question_chain = LLMChain(
    llm=llm,
    prompt=CONDENSE_QUESTION_PROMPT,
)

In [40]:

# 创建一个对话检索链
qa = ConversationalRetrievalChain(
    # 使用condense_question_chain作为问题生成器
    question_generator=condense_question_chain,
    # 使用docsearch.as_retriever()作为检索器
    retriever=docsearch.as_retriever(),
    # 使用memory作为内存
    memory=memory,
    # 使用final_qa_chain作为文档合并链
    combine_docs_chain=final_qa_chain,
)

In [41]:

# 定义问题
query = "What did the president say about Ketanji Brown Jackson"

# 使用transformers库中的qa方法进行问答
result = qa({"question": query})

In [42]:

result

Out[42]:

{'question': 'What did the president say about Ketanji Brown Jackson',
 'chat_history': [HumanMessage(content='What did the president say about Ketanji Brown Jackson', additional_kwargs={}, example=False),
  AIMessage(content='{\n  "answer": "The President nominated Ketanji Brown Jackson as a Circuit Court of Appeals Judge and praised her as one of the nation\'s top legal minds who will continue Justice Breyer\'s legacy of excellence.",\n  "sources": ["31-pl"]\n}', additional_kwargs={}, example=False)],
 'answer': '{\n  "answer": "The President nominated Ketanji Brown Jackson as a Circuit Court of Appeals Judge and praised her as one of the nation\'s top legal minds who will continue Justice Breyer\'s legacy of excellence.",\n  "sources": ["31-pl"]\n}'}

In [43]:

# 定义问题
query = "what did he say about her predecessor?"
# 使用预训练的模型进行问答
result = qa({"question": query})

In [44]:

result

Out[44]:

{'question': 'what did he say about her predecessor?',
 'chat_history': [HumanMessage(content='What did the president say about Ketanji Brown Jackson', additional_kwargs={}, example=False),
  AIMessage(content='{\n  "answer": "The President nominated Ketanji Brown Jackson as a Circuit Court of Appeals Judge and praised her as one of the nation\'s top legal minds who will continue Justice Breyer\'s legacy of excellence.",\n  "sources": ["31-pl"]\n}', additional_kwargs={}, example=False),
  HumanMessage(content='what did he say about her predecessor?', additional_kwargs={}, example=False),
  AIMessage(content='{\n  "answer": "The President honored Justice Stephen Breyer for his service as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.",\n  "sources": ["31-pl"]\n}', additional_kwargs={}, example=False)],
 'answer': '{\n  "answer": "The President honored Justice Stephen Breyer for his service as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court.",\n  "sources": ["31-pl"]\n}'}

使用自定义输出模式¶

我们可以通过传入自定义模式来更改链条的输出。该模式的值和描述将影响我们传递给OpenAI API的函数，这意味着它不仅会影响我们如何解析输出，还会改变OpenAI的输出本身。例如，我们可以在模式中添加一个countries_referenced参数，并描述我们希望该参数表示什么，这将导致OpenAI的输出在响应中包含一个发言者的描述。

除了上面的例子，我们还可以在链条中添加自定义提示。这将允许您为响应添加额外的上下文，对于问答非常有用。

In [45]:

# 导入必要的模块
from typing import List

from langchain.chains.openai_functions import create_qa_with_structure_chain
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_core.messages import HumanMessage, SystemMessage
from pydantic import BaseModel, Field

In [46]:

from pydantic import BaseModel, Field
from typing import List

class CustomResponseSchema(BaseModel):
    """An answer to the question being asked, with sources."""

    answer: str = Field(..., description="Answer to the question that was asked")  # 问题的答案
    countries_referenced: List[str] = Field(
        ..., description="All of the countries mentioned in the sources"  # 所有在来源中提到的国家
    )
    sources: List[str] = Field(
        ..., description="List of sources used to answer the question"  # 用于回答问题的来源列表
    )


prompt_messages = [
    SystemMessage(
        content=(
            "You are a world class algorithm to answer "
            "questions in a specific format."
        )
    ),
    HumanMessage(content="Answer question using the following context"),  # 使用以下上下文回答问题
    HumanMessagePromptTemplate.from_template("{context}"),
    HumanMessagePromptTemplate.from_template("Question: {question}"),  # 问题：{question}
    HumanMessage(
        content="Tips: Make sure to answer in the correct format. Return all of the countries mentioned in the sources in uppercase characters."
    ),  # 提示：确保以正确的格式回答。以大写字符返回所有在来源中提到的国家。
]

chain_prompt = ChatPromptTemplate(messages=prompt_messages)

qa_chain_pydantic = create_qa_with_structure_chain(
    llm, CustomResponseSchema, output_parser="pydantic", prompt=chain_prompt
)
final_qa_chain_pydantic = StuffDocumentsChain(
    llm_chain=qa_chain_pydantic,
    document_variable_name="context",
    document_prompt=doc_prompt,
)
retrieval_qa_pydantic = RetrievalQA(
    retriever=docsearch.as_retriever(), combine_documents_chain=final_qa_chain_pydantic
)
query = "What did he say about russia"
retrieval_qa_pydantic.run(query)

Out[46]:

CustomResponseSchema(answer="He announced that American airspace will be closed off to all Russian flights, further isolating Russia and adding an additional squeeze on their economy. The Ruble has lost 30% of its value and the Russian stock market has lost 40% of its value. He also mentioned that Putin alone is to blame for Russia's reeling economy. The United States and its allies are providing support to Ukraine in their fight for freedom, including military, economic, and humanitarian assistance. The United States is giving more than $1 billion in direct assistance to Ukraine. He made it clear that American forces are not engaged and will not engage in conflict with Russian forces in Ukraine, but they are deployed to defend NATO allies in case Putin decides to keep moving west. He also mentioned that Putin's attack on Ukraine was premeditated and unprovoked, and that the West and NATO responded by building a coalition of freedom-loving nations to confront Putin. The free world is holding Putin accountable through powerful economic sanctions, cutting off Russia's largest banks from the international financial system, and preventing Russia's central bank from defending the Russian Ruble. The U.S. Department of Justice is also assembling a task force to go after the crimes of Russian oligarchs.", countries_referenced=['AMERICA', 'RUSSIA', 'UKRAINE'], sources=['4-pl', '5-pl', '2-pl', '3-pl'])