这个演示展示了如何使用Evaporate论文(Arora等人)从原始文本中提取DataFrame:https://arxiv.org/abs/2304.09433%E3%80%82
灵感是首先在一组训练文本上进行“拟合”。拟合过程使用LLM从文本生成一组解析函数。 这些拟合的函数然后在推理时应用于文本。
如果您在colab上打开这个笔记本,您可能需要安装LlamaIndex 🦙。
%pip install llama-index-llms-openai
%pip install llama-index-program-evaporate
!pip install llama-index
%load_ext autoreload
%autoreload 2
DFEvaporateProgram
¶DFEvaporateProgram
将从一组数据点中提取出一个二维数据框,给定一组字段和一些训练数据以在其上“拟合”一些函数。
在这里,我们从维基百科加载了一组城市数据。
wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]
from pathlib import Path
import requests
for title in wiki_titles:
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
# 'exintro': True,
"explaintext": True,
},
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
data_path = Path("data")
if not data_path.exists():
Path.mkdir(data_path)
with open(data_path / f"{title}.txt", "w") as fp:
fp.write(wiki_text)
from llama_index.core import SimpleDirectoryReader
# 加载所有维基文档
city_docs = {}
for wiki_title in wiki_titles:
city_docs[wiki_title] = SimpleDirectoryReader(
input_files=[f"data/{wiki_title}.txt"]
).load_data()
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
# 设置设置
Settings.llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.chunk_size = 512
# 为每个文档获取节点
city_nodes = {}
for wiki_title in wiki_titles:
docs = city_docs[wiki_title]
nodes = Settings.node_parser.get_nodes_from_documents(docs)
city_nodes[wiki_title] = nodes
在这里,我们演示了如何使用我们的DFEvaporateProgram
提取数据点。给定一组字段,DFEvaporateProgram
可以首先在一组训练数据上拟合函数,然后在推断数据上运行提取操作。
from llama_index.program.evaporate import DFEvaporateProgram
# 定义程序
program = DFEvaporateProgram.from_defaults(
fields_to_extract=["population"],
)
在本节中,我们将讨论如何使用Python中的scipy.optimize.curve_fit
函数来拟合函数到数据。拟合函数是一种将数学模型与实际数据相匹配的方法,它可以用于预测、分析和理解数据。我们将使用curve_fit
函数来拟合一个简单的线性函数,并讨论如何处理非线性函数的拟合。
program.fit_fields(city_nodes["Toronto"][:1])
{'population': 'def get_population_field(text: str):\n """\n Function to extract population. \n """\n \n # Use regex to find the population field\n pattern = r\'(?<=population of )(\\d+,?\\d*)\'\n population_field = re.search(pattern, text).group(1)\n \n # Return the population field as a single value\n return int(population_field.replace(\',\', \'\'))'}
# 查看提取的函数
print(program.get_function_str("population"))
def get_population_field(text: str): """ Function to extract population. """ # Use regex to find the population field pattern = r'(?<=population of )(\d+,?\d*)' population_field = re.search(pattern, text).group(1) # Return the population field as a single value return int(population_field.replace(',', ''))
seattle_df = program(nodes=city_nodes["Seattle"][:1])
seattle_df
DataFrameRowsOnly(rows=[DataFrameRow(row_values=[749256])])
MultiValueEvaporateProgram
¶与假设输出符合2D表格格式(每个节点一行)的 DFEvaporateProgram
相比,MultiValueEvaporateProgram
返回一个 DataFrameRow
对象列表 - 每个对象对应一列,并且可以包含可变长度的值。如果我们想要从给定的文本中提取一个字段的多个值,这将会很有帮助。
在这个例子中,我们使用这个程序来解析金牌计数。
Settings.llm = OpenAI(temperature=0, model="gpt-4")
Settings.chunk_size = 1024
Settings.chunk_overlap = 0
from llama_index.core.data_structs import Node
# 奥运会奖牌总数:https://en.wikipedia.org/wiki/All-time_Olympic_Games_medal_table
train_text = """
<table class="wikitable sortable" style="margin-top:0; text-align:center; font-size:90%;">
<tbody><tr>
<th>队伍(IOC代码)
</th>
<th>夏季次数
</th>
<th>冬季次数
</th>
<th>总次数
</th></tr>
<tr>
<td align="left"><span id="ALB"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/22px-Flag_of_Albania.svg.png" decoding="async" width="22" height="16" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/33px-Flag_of_Albania.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/36/Flag_of_Albania.svg/44px-Flag_of_Albania.svg.png 2x" data-file-width="980" data-file-height="700" /> <a href="/wiki/Albania_at_the_Olympics" title="Albania at the Olympics">阿尔巴尼亚</a> <span style="font-size:90%;">(ALB)</span></span>
</td>
<td style="background:#f2f2ce;">9</td>
<td style="background:#cedff2;">5</td>
<td>14
</td></tr>
<tr>
<td align="left"><span id="ASA"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/22px-Flag_of_American_Samoa.svg.png" decoding="async" width="22" height="11" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/33px-Flag_of_American_Samoa.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/87/Flag_of_American_Samoa.svg/44px-Flag_of_American_Samoa.svg.png 2x" data-file-width="1000" data-file-height="500" /> <a href="/wiki/American_Samoa_at_the_Olympics" title="American Samoa at the Olympics">美属萨摩亚</a> <span style="font-size:90%;">(ASA)</span></span>
</td>
<td style="background:#f2f2ce;">9</td>
<td style="background:#cedff2;">2</td>
<td>11
</td></tr>
<tr>
<td align="left"><span id="AND"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/1/19/Flag_of_Andorra.svg/22px-Flag_of_Andorra.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/1/19/Flag_of_Andorra.svg/33px-Flag_of_Andorra.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/1/19/Flag_of_Andorra.svg/44px-Flag_of_Andorra.svg.png 2x" data-file-width="1000" data-file-height="700" /> <a href="/wiki/Andorra_at_the_Olympics" title="Andorra at the Olympics">安道尔</a> <span style="font-size:90%;">(AND)</span></span>
</td>
<td style="background:#f2f2ce;">12</td>
<td style="background:#cedff2;">13</td>
<td>25
</td></tr>
<tr>
<td align="left"><span id="ANG"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/9d/Flag_of_Angola.svg/22px-Flag_of_Angola.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/9d/Flag_of_Angola.svg/33px-Flag_of_Angola.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9d/Flag_of_Angola.svg/44px-Flag_of_Angola.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Angola_at_the_Olympics" title="Angola at the Olympics">安哥拉</a> <span style="font-size:90%;">(ANG)</span></span>
</td>
<td style="background:#f2f2ce;">10</td>
<td style="background:#cedff2;">0</td>
<td>10
</td></tr>
<tr>
<td align="left"><span id="ANT"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/89/Flag_of_Antigua_and_Barbuda.svg/22px-Flag_of_Antigua_and_Barbuda.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/89/Flag_of_Antigua_and_Barbuda.svg/33px-Flag_of_Antigua_and_Barbuda.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/89/Flag_of_Antigua_and_Barbuda.svg/44px-Flag_of_Antigua_and_Barbuda.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Antigua_and_Barbuda_at_the_Olympics" title="Antigua and Barbuda at the Olympics">安提瓜和巴布达</a> <span style="font-size:90%;">(ANT)</span></span>
</td>
<td style="background:#f2f2ce;">11</td>
<td style="background:#cedff2;">0</td>
<td>11
</td></tr>
<tr>
<td align="left"><span id="ARU"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Flag_of_Aruba.svg/22px-Flag_of_Aruba.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Flag_of_Aruba.svg/33px-Flag_of_Aruba.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f6/Flag_of_Aruba.svg/44px-Flag_of_Aruba.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Aruba_at_the_Olympics" title="Aruba at the Olympics">阿鲁巴</a> <span style="font-size:90%;">(ARU)</span></span>
</td>
<td style="background:#f2f2ce;">9</td>
<td style="background:#cedff2;">0</td>
<td>9
</td></tr>
"""
train_nodes = [Node(text=train_text)]
infer_text = """
<td align="left"><span id="BAN"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Flag_of_Bangladesh.svg/22px-Flag_of_Bangladesh.svg.png" decoding="async" width="22" height="13" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Flag_of_Bangladesh.svg/33px-Flag_of_Bangladesh.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Flag_of_Bangladesh.svg/44px-Flag_of_Bangladesh.svg.png 2x" data-file-width="1000" data-file-height="600" /> <a href="/wiki/Bangladesh_at_the_Olympics" title="Bangladesh at the Olympics">Bangladesh</a> <span style="font-size:90%;">(BAN)</span></span>
</td>
<td style="background:#f2f2ce;">10</td>
<td style="background:#cedff2;">0</td>
<td>10
</td></tr>
<tr>
<td align="left"><span id="BIZ"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Flag_of_Belize.svg/22px-Flag_of_Belize.svg.png" decoding="async" width="22" height="13" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Flag_of_Belize.svg/33px-Flag_of_Belize.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Flag_of_Belize.svg/44px-Flag_of_Belize.svg.png 2x" data-file-width="1000" data-file-height="600" /> <a href="/wiki/Belize_at_the_Olympics" title="Belize at the Olympics">Belize</a> <span style="font-size:90%;">(BIZ)</span></span> <sup class="reference" id="ref_BIZBIZ"><a href="#endnote_BIZBIZ">[BIZ]</a></sup>
</td>
<td style="background:#f2f2ce;">13</td>
<td style="background:#cedff2;">0</td>
<td>13
</td></tr>
<tr>
<td align="left"><span id="BEN"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Flag_of_Benin.svg/22px-Flag_of_Benin.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Flag_of_Benin.svg/33px-Flag_of_Benin.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/0a/Flag_of_Benin.svg/44px-Flag_of_Benin.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Benin_at_the_Olympics" title="Benin at the Olympics">Benin</a> <span style="font-size:90%;">(BEN)</span></span> <sup class="reference" id="ref_BENBEN"><a href="#endnote_BENBEN">[BEN]</a></sup>
</td>
<td style="background:#f2f2ce;">12</td>
<td style="background:#cedff2;">0</td>
<td>12
</td></tr>
<tr>
<td align="left"><span id="BHU"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/91/Flag_of_Bhutan.svg/22px-Flag_of_Bhutan.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/91/Flag_of_Bhutan.svg/33px-Flag_of_Bhutan.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/91/Flag_of_Bhutan.svg/44px-Flag_of_Bhutan.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Bhutan_at_the_Olympics" title="Bhutan at the Olympics">Bhutan</a> <span style="font-size:90%;">(BHU)</span></span>
</td>
<td style="background:#f2f2ce;">10</td>
<td style="background:#cedff2;">0</td>
<td>10
</td></tr>
<tr>
<td align="left"><span id="BOL"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Bolivia.svg/22px-Flag_of_Bolivia.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Bolivia.svg/33px-Flag_of_Bolivia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/48/Flag_of_Bolivia.svg/44px-Flag_of_Bolivia.svg.png 2x" data-file-width="1100" data-file-height="750" /> <a href="/wiki/Bolivia_at_the_Olympics" title="Bolivia at the Olympics">Bolivia</a> <span style="font-size:90%;">(BOL)</span></span>
</td>
<td style="background:#f2f2ce;">15</td>
<td style="background:#cedff2;">7</td>
<td>22
</td></tr>
<tr>
<td align="left"><span id="BIH"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/22px-Flag_of_Bosnia_and_Herzegovina.svg.png" decoding="async" width="22" height="11" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/33px-Flag_of_Bosnia_and_Herzegovina.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/44px-Flag_of_Bosnia_and_Herzegovina.svg.png 2x" data-file-width="800" data-file-height="400" /> <a href="/wiki/Bosnia_and_Herzegovina_at_the_Olympics" title="Bosnia and Herzegovina at the Olympics">Bosnia and Herzegovina</a> <span style="font-size:90%;">(BIH)</span></span>
</td>
<td style="background:#f2f2ce;">8</td>
<td style="background:#cedff2;">8</td>
<td>16
</td></tr>
<tr>
<td align="left"><span id="IVB"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/42/Flag_of_the_British_Virgin_Islands.svg/22px-Flag_of_the_British_Virgin_Islands.svg.png" decoding="async" width="22" height="11" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/4/42/Flag_of_the_British_Virgin_Islands.svg/33px-Flag_of_the_British_Virgin_Islands.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/42/Flag_of_the_British_Virgin_Islands.svg/44px-Flag_of_the_British_Virgin_Islands.svg.png 2x" data-file-width="1200" data-file-height="600" /> <a href="/wiki/British_Virgin_Islands_at_the_Olympics" title="British Virgin Islands at the Olympics">British Virgin Islands</a> <span style="font-size:90%;">(IVB)</span></span>
</td>
<td style="background:#f2f2ce;">10</td>
<td style="background:#cedff2;">2</td>
<td>12
</td></tr>
<tr>
<td align="left"><span id="BRU"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Brunei.svg/22px-Flag_of_Brunei.svg.png" decoding="async" width="22" height="11" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Brunei.svg/33px-Flag_of_Brunei.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/9c/Flag_of_Brunei.svg/44px-Flag_of_Brunei.svg.png 2x" data-file-width="1440" data-file-height="720" /> <a href="/wiki/Brunei_at_the_Olympics" title="Brunei at the Olympics">Brunei</a> <span style="font-size:90%;">(BRU)</span></span> <sup class="reference" id="ref_AA"><a href="#endnote_AA">[A]</a></sup>
</td>
<td style="background:#f2f2ce;">6</td>
<td style="background:#cedff2;">0</td>
<td>6
</td></tr>
<tr>
<td align="left"><span id="CAM"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/8/83/Flag_of_Cambodia.svg/22px-Flag_of_Cambodia.svg.png" decoding="async" width="22" height="14" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/83/Flag_of_Cambodia.svg/33px-Flag_of_Cambodia.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/83/Flag_of_Cambodia.svg/44px-Flag_of_Cambodia.svg.png 2x" data-file-width="1000" data-file-height="640" /> <a href="/wiki/Cambodia_at_the_Olympics" title="Cambodia at the Olympics">Cambodia</a> <span style="font-size:90%;">(CAM)</span></span>
</td>
<td style="background:#f2f2ce;">10</td>
<td style="background:#cedff2;">0</td>
<td>10
</td></tr>
<tr>
<td align="left"><span id="CPV"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Flag_of_Cape_Verde.svg/22px-Flag_of_Cape_Verde.svg.png" decoding="async" width="22" height="13" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Flag_of_Cape_Verde.svg/33px-Flag_of_Cape_Verde.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/38/Flag_of_Cape_Verde.svg/44px-Flag_of_Cape_Verde.svg.png 2x" data-file-width="1020" data-file-height="600" /> <a href="/wiki/Cape_Verde_at_the_Olympics" title="Cape Verde at the Olympics">Cape Verde</a> <span style="font-size:90%;">(CPV)</span></span>
</td>
<td style="background:#f2f2ce;">7</td>
<td style="background:#cedff2;">0</td>
<td>7
</td></tr>
<tr>
<td align="left"><span id="CAY"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Flag_of_the_Cayman_Islands.svg/22px-Flag_of_the_Cayman_Islands.svg.png" decoding="async" width="22" height="11" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Flag_of_the_Cayman_Islands.svg/33px-Flag_of_the_Cayman_Islands.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/0/0f/Flag_of_the_Cayman_Islands.svg/44px-Flag_of_the_Cayman_Islands.svg.png 2x" data-file-width="1200" data-file-height="600" /> <a href="/wiki/Cayman_Islands_at_the_Olympics" title="Cayman Islands at the Olympics">Cayman Islands</a> <span style="font-size:90%;">(CAY)</span></span>
</td>
<td style="background:#f2f2ce;">11</td>
<td style="background:#cedff2;">2</td>
<td>13
</td></tr>
<tr>
<td align="left"><span id="CAF"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Central_African_Republic.svg/22px-Flag_of_the_Central_African_Republic.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Central_African_Republic.svg/33px-Flag_of_the_Central_African_Republic.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Central_African_Republic.svg/44px-Flag_of_the_Central_African_Republic.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Central_African_Republic_at_the_Olympics" title="Central African Republic at the Olympics">Central African Republic</a> <span style="font-size:90%;">(CAF)</span></span>
</td>
<td style="background:#f2f2ce;">11</td>
<td style="background:#cedff2;">0</td>
<td>11
</td></tr>
<tr>
<td align="left"><span id="CHA"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Flag_of_Chad.svg/22px-Flag_of_Chad.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Flag_of_Chad.svg/33px-Flag_of_Chad.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Flag_of_Chad.svg/44px-Flag_of_Chad.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Chad_at_the_Olympics" title="Chad at the Olympics">Chad</a> <span style="font-size:90%;">(CHA)</span></span>
</td>
<td style="background:#f2f2ce;">13</td>
<td style="background:#cedff2;">0</td>
<td>13
</td></tr>
<tr>
<td align="left"><span id="COM"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/94/Flag_of_the_Comoros.svg/22px-Flag_of_the_Comoros.svg.png" decoding="async" width="22" height="13" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/94/Flag_of_the_Comoros.svg/33px-Flag_of_the_Comoros.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/94/Flag_of_the_Comoros.svg/44px-Flag_of_the_Comoros.svg.png 2x" data-file-width="1000" data-file-height="600" /> <a href="/wiki/Comoros_at_the_Olympics" title="Comoros at the Olympics">Comoros</a> <span style="font-size:90%;">(COM)</span></span>
</td>
<td style="background:#f2f2ce;">7</td>
<td style="background:#cedff2;">0</td>
<td>7
</td></tr>
<tr>
<td align="left"><span id="CGO"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_the_Republic_of_the_Congo.svg/22px-Flag_of_the_Republic_of_the_Congo.svg.png" decoding="async" width="22" height="15" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_the_Republic_of_the_Congo.svg/33px-Flag_of_the_Republic_of_the_Congo.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/9/92/Flag_of_the_Republic_of_the_Congo.svg/44px-Flag_of_the_Republic_of_the_Congo.svg.png 2x" data-file-width="900" data-file-height="600" /> <a href="/wiki/Republic_of_the_Congo_at_the_Olympics" title="Republic of the Congo at the Olympics">Republic of the Congo</a> <span style="font-size:90%;">(CGO)</span></span>
</td>
<td style="background:#f2f2ce;">13</td>
<td style="background:#cedff2;">0</td>
<td>13
</td></tr>
<tr>
<td align="left"><span id="COD"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Democratic_Republic_of_the_Congo.svg/22px-Flag_of_the_Democratic_Republic_of_the_Congo.svg.png" decoding="async" width="22" height="17" class="thumbborder" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Democratic_Republic_of_the_Congo.svg/33px-Flag_of_the_Democratic_Republic_of_the_Congo.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/6/6f/Flag_of_the_Democratic_Republic_of_the_Congo.svg/44px-Flag_of_the_Democratic_Republic_of_the_Congo.svg.png 2x" data-file-width="800" data-file-height="600" /> <a href="/wiki/Democratic_Republic_of_the_Congo_at_the_Olympics" title="Democratic Republic of the Congo at the Olympics">Democratic Republic of the Congo</a> <span style="font-size:90%;">(COD)</span></span> <sup class="reference" id="ref_CODCOD"><a href="#endnote_CODCOD">[COD]</a></sup>
</td>
<td style="background:#f2f2ce;">11</td>
<td style="background:#cedff2;">0</td>
<td>11
</td></tr>
"""
infer_nodes = [Node(text=infer_text)]
from llama_index.core.program.predefined import MultiValueEvaporateProgram
program = MultiValueEvaporateProgram.from_defaults(
fields_to_extract=["countries", "medal_count"],
)
program.fit_fields(train_nodes[:1])
{'countries': 'def get_countries_field(text: str):\n """\n Function to extract countries. \n """\n \n # Use regex to extract the countries field\n countries_field = re.findall(r\'<a href=".*">(.*)</a>\', text)\n \n # Return the result as a list\n return countries_field', 'medal_count': 'def get_medal_count_field(text: str):\n """\n Function to extract medal_count. \n """\n \n # Use regex to extract the medal count field\n medal_count_field = re.findall(r\'<td style="background:#f2f2ce;">(.*?)</td>\', text)\n \n # Return the result as a list\n return medal_count_field'}
print(program.get_function_str("countries"))
def get_countries_field(text: str): """ Function to extract countries. """ # Use regex to extract the countries field countries_field = re.findall(r'<a href=".*">(.*)</a>', text) # Return the result as a list return countries_field
print(program.get_function_str("medal_count"))
def get_medal_count_field(text: str): """ Function to extract medal_count. """ # Use regex to extract the medal count field medal_count_field = re.findall(r'<td style="background:#f2f2ce;">(.*?)</td>', text) # Return the result as a list return medal_count_field
result = program(nodes=infer_nodes[:1])
# 输出国家
print(f"国家: {result.columns[0].row_values}\n")
# 输出奖牌计数
print(f"奖牌计数: {result.columns[0].row_values}\n")
Countries: ['Bangladesh', '[BIZ]', '[BEN]', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'British Virgin Islands', '[A]', 'Cambodia', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Comoros', 'Republic of the Congo', '[COD]'] Medal Counts: ['Bangladesh', '[BIZ]', '[BEN]', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'British Virgin Islands', '[A]', 'Cambodia', 'Cape Verde', 'Cayman Islands', 'Central African Republic', 'Chad', 'Comoros', 'Republic of the Congo', '[COD]']
EvaporateExtractor
¶底层的EvaporateExtractor
提供了一些额外的功能,例如实际上帮助在一组文本中识别字段。
在这里,我们展示了如何使用identify_fields
来确定围绕一个通用的topic
字段的相关字段。
# 一个节点列表,每个城市对应一个节点,对应于介绍段落
# 城市人口节点 = []
城市人口节点 = [城市节点["多伦多"][0], 城市节点["西雅图"][0]]
extractor = program.extractor
# 尝试使用多伦多和西雅图(应该提取“人口”)进行测试
existing_fields = extractor.identify_fields(
city_pop_nodes, topic="population", fields_top_k=4
)
existing_fields
["seattle metropolitan area's population"]