定义自定义查询引擎¶

您可以（也应该）定义自己的自定义查询引擎，以便将其插入到下游的LlamaIndex工作流中，无论您是在构建RAG、代理还是其他应用程序。

我们提供了一个CustomQueryEngine，它可以方便地定义您自己的查询。

设置¶

我们首先加载一些示例数据并对其进行索引。

如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。

In [ ]:

%pip install llama-index-llms-openai

In [ ]:

!pip install llama-index

In [ ]:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

下载数据

In [ ]:

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [ ]:

# 加载文档documents = SimpleDirectoryReader("./data//paul_graham/").load_data()

In [ ]:

index = VectorStoreIndex.from_documents(documents)
retriever = index.as_retriever()

构建自定义查询引擎¶

我们构建一个自定义查询引擎，模拟一个RAG流水线。首先进行检索，然后进行合成。

要定义一个 CustomQueryEngine，你只需要将一些初始化参数定义为属性，并实现 custom_query 函数。

默认情况下，custom_query 可以返回一个 Response 对象（响应合成器返回的对象），但也可以只返回一个字符串。这分别是选项1和选项2。

In [ ]:

from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.core import get_response_synthesizer
from llama_index.core.response_synthesizers import BaseSynthesizer

选项 1 (`RAGQueryEngine`)¶

In [ ]:

class RAGQueryEngine(CustomQueryEngine):    """RAG查询引擎。"""    retriever: BaseRetriever    response_synthesizer: BaseSynthesizer    def custom_query(self, query_str: str):        nodes = self.retriever.retrieve(query_str)        response_obj = self.response_synthesizer.synthesize(query_str, nodes)        return response_obj

选项 2 (`RAGStringQueryEngine`)¶

In [ ]:

# 选项2：返回一个字符串（我们使用原始LLM调用进行说明）from llama_index.llms.openai import OpenAIfrom llama_index.core import PromptTemplateqa_prompt = PromptTemplate(    "下面是上下文信息。\n"    "---------------------\n"    "{context_str}\n"    "---------------------\n"    "根据上下文信息和非先验知识，回答查询。\n"    "查询：{query_str}\n"    "答案：")class RAGStringQueryEngine(CustomQueryEngine):    """RAG字符串查询引擎。"""    retriever: BaseRetriever    response_synthesizer: BaseSynthesizer    llm: OpenAI    qa_prompt: PromptTemplate    def custom_query(self, query_str: str):        nodes = self.retriever.retrieve(query_str)        context_str = "\n\n".join([n.node.get_content() for n in nodes])        response = self.llm.complete(            qa_prompt.format(context_str=context_str, query_str=query_str)        )        return str(response)

尝试一下¶

现在我们将在我们的样本数据上尝试一下。

尝试选项1（`RAGQueryEngine`）¶

In [ ]:

synthesizer = get_response_synthesizer(response_mode="compact")
query_engine = RAGQueryEngine(
    retriever=retriever, response_synthesizer=synthesizer
)

In [ ]:

response = query_engine.query("What did the author do growing up?")

In [ ]:

print(str(response))

The author worked on writing and programming outside of school before college. They wrote short stories and tried writing programs on an IBM 1401 computer using an early version of Fortran. They also mentioned getting a microcomputer, building it themselves, and writing simple games and programs on it.

In [ ]:

print(response.source_nodes[0].get_content())

尝试选项2 (`RAGStringQueryEngine`)¶

In [ ]:

llm = OpenAI(model="gpt-3.5-turbo")

query_engine = RAGStringQueryEngine(
    retriever=retriever,
    response_synthesizer=synthesizer,
    llm=llm,
    qa_prompt=qa_prompt,
)

In [ ]:

response = query_engine.query("What did the author do growing up?")

In [ ]:

print(str(response))

The author worked on writing and programming before college. They wrote short stories and started programming on the IBM 1401 computer in 9th grade. They later got a microcomputer and continued programming, writing simple games and a word processor.

定义自定义查询引擎¶

设置¶

构建自定义查询引擎¶

选项 1 (RAGQueryEngine)¶

选项 2 (RAGStringQueryEngine)¶

尝试一下¶

尝试选项1（RAGQueryEngine）¶

尝试选项2 (RAGStringQueryEngine)¶

选项 1 (`RAGQueryEngine`)¶

选项 2 (`RAGStringQueryEngine`)¶

尝试选项1（`RAGQueryEngine`）¶

尝试选项2 (`RAGStringQueryEngine`)¶