Notebook

多文档代理（V1）¶

在本指南中，您将学习如何在 LlamaIndex 文档上设置一个多文档代理。

这是 V0 多文档代理的扩展，具有以下附加功能：

在文档（工具）检索过程中重新排序
查询规划工具，代理可以用来规划

我们使用以下架构实现这一点：

在每个文档上设置一个“文档代理”：每个文档代理可以在其文档内进行问答/总结
在这组文档代理上设置一个顶层代理。进行工具检索，然后在工具集上进行协同训练以回答问题。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

%pip install llama-index-core
%pip install llama-index-agent-openai
%pip install llama-index-readers-file
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install unstructured[html]

In [ ]:

%load_ext autoreload
%autoreload 2

设置和下载数据¶

在这一部分，我们将加载LlamaIndex文档。

In [ ]:

domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}

In [ ]:

from llama_index.readers.file import UnstructuredReader

reader = UnstructuredReader()

In [ ]:

from pathlib import Path

all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]

In [ ]:

all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]

In [ ]:

len(all_html_files)

Out[ ]:

In [ ]:

from llama_index.core import Document# TODO: 如果您想要更多的文档，请将其设置为更高的值doc_limit = 100docs = []for idx, f in enumerate(all_html_files):    if idx > doc_limit:        break    print(f"索引 {idx}/{len(all_html_files)}")    loaded_docs = reader.load_data(file=f, split_documents=True)    # 硬编码索引。这之前的所有内容都是所有页面的目录    start_idx = 72    loaded_doc = Document(        text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),        metadata={"path": str(f)},    )    print(loaded_doc.metadata["path"])    docs.append(loaded_doc)

定义全局LLM + 嵌入¶

在这个notebook中，我们将定义一个全局LLM（全局线性语言模型）和嵌入层。全局LLM是一种用于自然语言处理任务的模型，它可以学习单词之间的关系并将它们映射到一个连续的向量空间中。嵌入层用于将单词转换为密集的向量表示，这些向量可以作为模型的输入。

In [ ]:

import os

os.environ["OPENAI_API_KEY"] = "sk-..."

import nest_asyncio

nest_asyncio.apply()

In [ ]:

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

llm = OpenAI(model="gpt-3.5-turbo")
Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small", embed_batch_size=256
)

构建多文档代理¶

在本节中，我们将向您展示如何构建多文档代理。我们首先为每个文档构建一个文档代理，然后使用对象索引定义顶层父代理。

为每个文档构建文档代理¶

在这一部分，我们为每个文档定义"文档代理"。

我们为每个文档定义了一个向量索引（用于语义搜索）和摘要索引（用于摘要生成）。然后，这两个查询引擎被转换为工具，传递给一个调用OpenAI函数的代理。

这个文档代理可以动态选择在给定文档中执行语义搜索或摘要生成。

我们为每个城市创建一个单独的文档代理。

In [ ]:

from llama_index.agent.openai import OpenAIAgentfrom llama_index.core import (    load_index_from_storage,    StorageContext,    VectorStoreIndex,)from llama_index.core import SummaryIndexfrom llama_index.core.tools import QueryEngineTool, ToolMetadatafrom llama_index.core.node_parser import SentenceSplitterimport osfrom tqdm.notebook import tqdmimport pickleasync def build_agent_per_doc(nodes, file_base):    print(file_base)    vi_out_path = f"./data/llamaindex_docs/{file_base}"    summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"    if not os.path.exists(vi_out_path):        Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)        # 构建向量索引        vector_index = VectorStoreIndex(nodes)        vector_index.storage_context.persist(persist_dir=vi_out_path)    else:        vector_index = load_index_from_storage(            StorageContext.from_defaults(persist_dir=vi_out_path),        )    # 构建摘要索引    summary_index = SummaryIndex(nodes)    # 定义查询引擎    vector_query_engine = vector_index.as_query_engine(llm=llm)    summary_query_engine = summary_index.as_query_engine(        response_mode="tree_summarize", llm=llm    )    # 提取摘要    if not os.path.exists(summary_out_path):        Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)        summary = str(            await summary_query_engine.aquery(                "提取该文档的简洁1-2行摘要"            )        )        pickle.dump(summary, open(summary_out_path, "wb"))    else:        summary = pickle.load(open(summary_out_path, "rb"))    # 定义工具    query_engine_tools = [        QueryEngineTool(            query_engine=vector_query_engine,            metadata=ToolMetadata(                name=f"vector_tool_{file_base}",                description=f"用于与特定事实相关的问题",            ),        ),        QueryEngineTool(            query_engine=summary_query_engine,            metadata=ToolMetadata(                name=f"summary_tool_{file_base}",                description=f"用于摘要问题",            ),        ),    ]    # 构建代理    function_llm = OpenAI(model="gpt-4")    agent = OpenAIAgent.from_tools(        query_engine_tools,        llm=function_llm,        verbose=True,        system_prompt=f"""\您是一名专门设计用于回答关于“{file_base}.html”部分LlamaIndex文档的查询的代理。在回答问题时，您必须始终使用提供的工具之一；不要依赖先前的知识。\""",    )    return agent, summaryasync def build_agents(docs):    node_parser = SentenceSplitter()    # 构建代理字典    agents_dict = {}    extra_info_dict = {}    # # 这是为了基准线    # all_nodes = []    for idx, doc in enumerate(tqdm(docs)):        nodes = node_parser.get_nodes_from_documents([doc])        # all_nodes.extend(nodes)        # ID将是基础+父级        file_path = Path(doc.metadata["path"])        file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)        agent, summary = await build_agent_per_doc(nodes, file_base)        agents_dict[file_base] = agent        extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}    return agents_dict, extra_info_dict

In [ ]:

agents_dict, extra_info_dict = await build_agents(docs)

构建Retriever-Enabled OpenAI Agent¶

我们构建了一个顶层代理，可以协调不同的文档代理来回答任何用户查询。

这个RetrieverOpenAIAgent在使用工具之前执行工具检索（与默认代理不同，后者试图将所有工具放入提示中）。

与V0版本相比的改进：与V0版本中的“基础”版本相比，我们进行了以下改进。

添加重新排序功能：我们使用Cohere重新排序器来更好地过滤候选文档集。
添加查询规划工具：我们添加了一个显式的查询规划工具，它是根据检索到的工具集动态创建的。

In [ ]:

# 为每个文档代理定义工具all_tools = []for file_base, agent in agents_dict.items():    summary = extra_info_dict[file_base]["summary"]    doc_tool = QueryEngineTool(        query_engine=agent,        metadata=ToolMetadata(            name=f"tool_{file_base}",            description=summary,        ),    )    all_tools.append(doc_tool)

In [ ]:

print(all_tools[0].metadata)

ToolMetadata(description='This document provides examples and documentation for an agent on the llama index platform.', name='tool_latest_index', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>)

In [ ]:

# 定义一个“对象”索引和检索器from llama_index.core import VectorStoreIndexfrom llama_index.core.objects import (    ObjectIndex,    ObjectRetriever,)from llama_index.postprocessor.cohere_rerank import CohereRerankfrom llama_index.core.query_engine import SubQuestionQueryEnginefrom llama_index.core.schema import QueryBundlefrom llama_index.llms.openai import OpenAIllm = OpenAI(model_name="gpt-4-0613")obj_index = ObjectIndex.from_objects(    all_tools,    index_cls=VectorStoreIndex,)vector_node_retriever = obj_index.as_node_retriever(    similarity_top_k=10,)# 定义一个自定义对象检索器，添加一个查询规划工具class CustomObjectRetriever(ObjectRetriever):    def __init__(        self,        retriever,        object_node_mapping,        node_postprocessors=None,        llm=None,    ):        self._retriever = retriever        self._object_node_mapping = object_node_mapping        self._llm = llm or OpenAI("gpt-4-0613")        self._node_postprocessors = node_postprocessors or []    def retrieve(self, query_bundle):        if isinstance(query_bundle, str):            query_bundle = QueryBundle(query_str=query_bundle)        nodes = self._retriever.retrieve(query_bundle)        for processor in self._node_postprocessors:            nodes = processor.postprocess_nodes(                nodes, query_bundle=query_bundle            )        tools = [self._object_node_mapping.from_node(n.node) for n in nodes]        sub_question_engine = SubQuestionQueryEngine.from_defaults(            query_engine_tools=tools, llm=self._llm        )        sub_question_description = f"""\用于涉及比较多个文档的任何查询。始终使用此工具进行比较查询 - 确保使用原始查询调用此工具。不要对涉及多个文档的任何查询使用其他工具。"""        sub_question_tool = QueryEngineTool(            query_engine=sub_question_engine,            metadata=ToolMetadata(                name="compare_tool", description=sub_question_description            ),        )        return tools + [sub_question_tool]

In [ ]:

# 用ObjectRetriever包装它以返回对象custom_obj_retriever = CustomObjectRetriever(    vector_node_retriever,    obj_index.object_node_mapping,    node_postprocessors=[CohereRerank(top_n=5)],    llm=llm,)

In [ ]:

tmps = custom_obj_retriever.retrieve("hello")# 应该是 5 + 1 -- 5 来自 reranker，1 来自子问题print(len(tmps))

In [ ]:

from llama_index.agent.openai import OpenAIAgentfrom llama_index.core.agent import ReActAgenttop_agent = OpenAIAgent.from_tools(    tool_retriever=custom_obj_retriever,    system_prompt=""" \您是一个专门用于回答关于文档的查询的代理。请始终使用提供的工具来回答问题。不要依赖先前的知识。\""",    llm=llm,    verbose=True,)# top_agent = ReActAgent.from_tools(#     tool_retriever=custom_obj_retriever,#     system_prompt=""" \# 您是一个专门用于回答关于文档的查询的代理。# 请始终使用提供的工具来回答问题。不要依赖先前的知识。\# """,#     llm=llm,#     verbose=True,# )

定义基准向量存储索引¶

作为比较的基准，我们定义一个“简单”的RAG管道，将所有文档都存储在单个向量索引集合中。

我们设置top_k = 4

In [ ]:

all_nodes = [
    n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]

In [ ]:

base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

运行示例查询¶

让我们运行一些示例查询，涵盖从针对单个文档的问答/摘要到针对多个文档的问答/摘要。

In [ ]:

response = top_agent.query(
    "What types of agents are available in LlamaIndex?",
)

Added user message to memory: What types of agents are available in LlamaIndex?
=== Calling Function ===
Calling function: tool_agents_index with args: {"input":"types of agents"}
Added user message to memory: types of agents
=== Calling Function ===
Calling function: vector_tool_agents_index with args: {
  "input": "types of agents"
}
Got output: The types of agents mentioned in the provided context are ReActAgent, Native OpenAIAgent, OpenAIAgent with Query Engine Tools, OpenAIAgent Query Planning, OpenAI Assistant, OpenAI Assistant Cookbook, Forced Function Calling, Parallel Function Calling, and Context Retrieval.
========================

Got output: The types of agents mentioned in the `agents_index.html` part of the LlamaIndex docs are:

1. ReActAgent
2. Native OpenAIAgent
3. OpenAIAgent with Query Engine Tools
4. OpenAIAgent Query Planning
5. OpenAI Assistant
6. OpenAI Assistant Cookbook
7. Forced Function Calling
8. Parallel Function Calling
9. Context Retrieval
========================

In [ ]:

print(response)

The types of agents available in LlamaIndex include ReActAgent, Native OpenAIAgent, OpenAIAgent with Query Engine Tools, OpenAIAgent Query Planning, OpenAI Assistant, OpenAI Assistant Cookbook, Forced Function Calling, Parallel Function Calling, and Context Retrieval.

In [ ]:

# 基线response = base_query_engine.query(    "LlamaIndex中有哪些类型的代理可用？",)print(str(response))

The types of agents available in LlamaIndex are ReActAgent, Native OpenAIAgent, and OpenAIAgent.

In [ ]:

response = top_agent.query(
    "Compare the content in the agents page vs. tools page."
)

Added user message to memory: Compare the content in the agents page vs. tools page.
=== Calling Function ===
Calling function: compare_tool with args: {"input":"agents vs tools"}
Generated 2 sub questions.
[tool_understanding_index] Q: What are the functionalities of agents in the Llama Index platform?
Added user message to memory: What are the functionalities of agents in the Llama Index platform?
[tool_understanding_index] Q: How do agents differ from tools in the Llama Index platform?
Added user message to memory: How do agents differ from tools in the Llama Index platform?
=== Calling Function ===
Calling function: vector_tool_understanding_index with args: {
  "input": "difference between agents and tools"
}
=== Calling Function ===
Calling function: vector_tool_understanding_index with args: {
  "input": "functionalities of agents"
}
Got output: Agents are typically individuals or entities that act on behalf of others, making decisions and taking actions based on predefined rules or instructions. On the other hand, tools are instruments or devices used to carry out specific functions or tasks, often under the control or direction of an agent.
========================

Got output: Agents typically have a range of functionalities that allow them to perform tasks autonomously or semi-autonomously. These functionalities may include data collection, analysis, decision-making, communication with other systems or users, and executing specific actions based on predefined rules or algorithms.
========================

[tool_understanding_index] A: In the context of the Llama Index platform, agents are entities that make decisions and take actions based on predefined rules or instructions. They are designed to interact with users, understand their queries, and provide appropriate responses. 

On the other hand, tools are instruments or devices that are used to perform specific functions or tasks. They are typically controlled or directed by an agent and do not make decisions on their own. They are used to assist the agents in providing accurate and relevant responses to user queries.
[tool_understanding_index] A: In the Llama Index platform, agents have a variety of functionalities. They can perform tasks autonomously or semi-autonomously. These tasks include data collection and analysis, making decisions, communicating with other systems or users, and executing specific actions. These actions are based on predefined rules or algorithms.
Got output: Agents in the Llama Index platform are responsible for making decisions and taking actions based on predefined rules or instructions. They interact with users, understand queries, and provide appropriate responses. On the other hand, tools in the platform are instruments or devices used to perform specific functions or tasks. Unlike agents, tools are typically controlled or directed by an agent and do not make decisions independently. Their role is to assist agents in delivering accurate and relevant responses to user queries.
========================

In [ ]:

print(response)

The comparison between the content in the agents page and the tools page highlights the difference in their roles and functionalities. Agents on the Llama Index platform are responsible for decision-making and interacting with users, while tools are instruments used to perform specific functions or tasks, controlled by agents to assist in providing responses.

In [ ]:

response = top_agent.query(
    "Can you compare the compact and tree_summarize response synthesizer response modes at a very high-level?"
)

Added user message to memory: Can you compare the compact and tree_summarize response synthesizer response modes at a very high-level?
=== Calling Function ===
Calling function: compare_tool with args: {"input":"Compare the compact and tree_summarize response synthesizer response modes at a very high-level."}
Generated 4 sub questions.
[tool_querying_index] Q: What are the key differences between the compact and tree_summarize response synthesizer response modes?
Added user message to memory: What are the key differences between the compact and tree_summarize response synthesizer response modes?
[tool_querying_index] Q: How does the compact response synthesizer response mode optimize query logic and response quality?
Added user message to memory: How does the compact response synthesizer response mode optimize query logic and response quality?
[tool_querying_index] Q: How does the tree_summarize response synthesizer response mode optimize query logic and response quality?
Added user message to memory: How does the tree_summarize response synthesizer response mode optimize query logic and response quality?
[tool_evaluating_index] Q: What are the guidelines for evaluating retrievals in the context of response synthesizer response modes?
Added user message to memory: What are the guidelines for evaluating retrievals in the context of response synthesizer response modes?
=== Calling Function ===
Calling function: vector_tool_querying_index with args: {
  "input": "compact response synthesizer response mode"
}
=== Calling Function ===
Calling function: summary_tool_querying_index with args: {
  "input": "tree_summarize response synthesizer response mode"
}
=== Calling Function ===
Calling function: vector_tool_querying_index with args: {
  "input": "compact vs tree_summarize response synthesizer response modes"
}
=== Calling Function ===
Calling function: vector_tool_evaluating_index with args: {
  "input": "evaluating retrievals response synthesizer response modes"
}
Got output: The response modes for the response synthesizer include "compact" and "tree_summarize".
========================

Got output: The response mode "tree_summarize" in the response synthesizer configures the system to recursively construct a tree from a set of Node objects and the query, returning the root node as the final response. This mode is particularly useful for summarization purposes.
========================

Got output: "compact" the prompt during each LLM call by stuffing as many Node text chunks that can fit within the maximum prompt size. If there are too many chunks to stuff in one prompt, "create and refine" an answer by going through multiple prompts.
========================

=== Calling Function ===
Calling function: summary_tool_querying_index with args: {
  "input": "compact vs tree_summarize response synthesizer response modes"
}
Got output: Response synthesizer response modes can be evaluated by comparing what was retrieved for a query to a set of nodes that were expected to be retrieved. This evaluation process typically involves analyzing metrics such as Mean Reciprocal Rank (MRR) and Hit Rate. It is important to evaluate a batch of retrievals to get a comprehensive understanding of the performance. If you are making calls to a hosted, remote LLM, you may also want to consider analyzing the cost implications of your application.
========================

Got output: The response modes for the response synthesizer include "compact" and "tree_summarize".
========================

[tool_querying_index] A: The compact response synthesizer response mode optimizes query logic and response quality by compacting the prompt during each LLM call. It does this by stuffing as many Node text chunks that can fit within the maximum prompt size. If there are too many chunks to fit in one prompt, it will "create and refine" an answer by going through multiple prompts. This approach allows for a more efficient use of the prompt space and can lead to more refined and accurate responses.
[tool_querying_index] A: The "tree_summarize" response synthesizer response mode optimizes query logic and response quality by recursively constructing a tree from a set of Node objects and the query. This approach allows the system to handle complex queries and generate comprehensive responses. The root node, which is returned as the final response, contains a summarized version of the information, making it easier for users to understand the response. This mode is particularly useful for summarization purposes, where the goal is to provide a concise yet comprehensive answer to a query.
[tool_evaluating_index] A: When evaluating retrievals in the context of response synthesizer response modes, you should compare what was retrieved for a query to a set of nodes that were expected to be retrieved. This evaluation process typically involves analyzing metrics such as Mean Reciprocal Rank (MRR) and Hit Rate. It's crucial to evaluate a batch of retrievals to get a comprehensive understanding of the performance. If you are making calls to a hosted, remote LLM, you may also want to consider analyzing the cost implications of your application.
[tool_querying_index] A: The "compact" and "tree_summarize" are two different response modes for the response synthesizer in LlamaIndex. 

The "compact" mode provides a more concise response, focusing on delivering the most relevant information in a compact format. This mode is useful when you want a brief and direct answer to your query.

On the other hand, the "tree_summarize" mode provides a more detailed and structured response. It breaks down the information into a tree-like structure, making it easier to understand the relationships and hierarchy of the information. This mode is useful when you want a comprehensive understanding of the query topic.
Got output: The "compact" response synthesizer mode focuses on providing a concise and direct response, while the "tree_summarize" mode offers a more detailed and structured response by breaking down information into a tree-like structure. The compact mode aims to deliver the most relevant information in a compact format, suitable for brief answers, whereas the tree_summarize mode is designed to provide a comprehensive understanding of the query topic by presenting information in a hierarchical manner.
========================

In [ ]:

print(str(response))

The "compact" response synthesizer mode provides concise and direct responses, while the "tree_summarize" mode offers detailed and structured responses in a tree-like format. The compact mode is suitable for brief answers, while the tree_summarize mode presents information hierarchically for a comprehensive understanding of the query topic.