!pip install -qq llama-index llama-index-postprocessor-presidio llama-index-vector-stores-milvus
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 20.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 28.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 29.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.2/109.2 kB 6.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 197.8/197.8 kB 9.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 187.4/187.4 kB 8.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.4/76.4 kB 4.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.9/77.9 kB 4.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.4/49.4 MB 13.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 374.1/374.1 kB 19.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 58.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 57.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.8/295.8 kB 16.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 26.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.6/53.6 kB 2.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 197.4/197.4 kB 10.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.6/97.6 kB 5.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 318.9/318.9 kB 16.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.3/49.3 kB 2.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 3.1 MB/s eta 0:00:00
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
from llama_index.core.postprocessor import NERPIINodePostprocessor
from llama_index.core.schema import TextNode, NodeWithScore
text = """
Hi, I'm Sarah Mitchell, and I just got a new credit card with the number 3714-496089-47322.
My personal email is sarah.mitchell@mailbox.com, and I'm currently based in Sydney.
By the way, I tried paying my utility bill with card number 6011-5832-9109-1726, but it didn't work.
For my bank transactions, I use this IBAN: NL91ABNA0417164300.
Also, can you help me with my Wi-Fi issues? I keep getting blocked by IP address 203.0.113.15.
I've shared a family photo on my personal blog at https://www.sarahs-lifediary.org/.
Oh, and my grandfather, George Stone, was born in 1921, while my grandmother, Emily Clarkson, was born in 1925.
Last question—what's the spending limit on my main card, the one ending in 8473?
"""
node = TextNode(text=text)
processor = NERPIINodePostprocessor()
new_nodes = processor.postprocess_nodes([NodeWithScore(node=node)])
print(new_nodes[0].node.get_text())
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english). Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 0%| | 0.00/998 [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/1.33G [00:00<?, ?B/s]
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
tokenizer_config.json: 0%| | 0.00/60.0 [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/213k [00:00<?, ?B/s]
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/pipelines/token_classification.py:168: UserWarning: `grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="simple"` instead. warnings.warn(
Hi, I'm [PER_9], and I just got a new credit card with the number 3714-496089-47322. My personal email is sarah.mitchell@mailbox.com, and I'm currently based in [LOC_169]. By the way, I tried paying my utility bill with card number 6011-5832-9109-1726, but it didn't work. For my bank transactions, I use this IBAN: NL91ABNA0417164300. Also, can you help me with my Wi-[MISC_374] issues? I keep getting blocked by IP address 203.0.113.15. I've shared a family photo on my personal blog at https://www.sarahs-lifediary.org/. Oh, and my grandfather, [PER_545], was born in 1921, while my grandmother, [PER_599], was born in 1925. Last question—what's the spending limit on my main card, the one ending in 8473?
from llama_index.core.postprocessor import PIINodePostprocessor
from llama_index.core.schema import TextNode, NodeWithScore
from llama_index.llms.openai import OpenAI
text = """
Hi, I'm Sarah Mitchell, and I just got a new credit card with the number 3714-496089-47322.
My personal email is sarah.mitchell@mailbox.com, and I'm currently based in Sydney.
By the way, I tried paying my utility bill with card number 6011-5832-9109-1726, but it didn't work.
For my bank transactions, I use this IBAN: NL91ABNA0417164300.
Also, can you help me with my Wi-Fi issues? I keep getting blocked by IP address 203.0.113.15.
I've shared a family photo on my personal blog at https://www.sarahs-lifediary.org/.
Oh, and my grandfather, George Stone, was born in 1921, while my grandmother, Emily Clarkson, was born in 1925.
Last question—what's the spending limit on my main card, the one ending in 8473?
"""
node = TextNode(text=text)
processor = PIINodePostprocessor(llm=OpenAI())
new_nodes = processor.postprocess_nodes([NodeWithScore(node=node)])
print(new_nodes[0].node.get_text())
Hi, I'm [NAME1] [NAME2], and I just got a new credit card with the number [CREDIT_CARD_NUMBER1]. My personal email is [EMAIL], and I'm currently based in [CITY]. By the way, I tried paying my utility bill with card number [CREDIT_CARD_NUMBER2], but it didn't work. For my bank transactions, I use this IBAN: [IBAN]. Also, can you help me with my Wi-Fi issues? I keep getting blocked by IP address [IP_ADDRESS]. I've shared a family photo on my personal blog at [URL]. Oh, and my grandfather, [NAME3] [NAME4], was born in [DATE1], while my grandmother, [NAME5] [NAME6], was born in [DATE2]. Last question—what's the spending limit on my main card, the one ending in [CREDIT_CARD_NUMBER3]?
from llama_index.postprocessor.presidio import PresidioPIINodePostprocessor
from llama_index.core.schema import TextNode, NodeWithScore
from llama_index.llms.openai import OpenAI
text = """
Hi, I'm Sarah Mitchell, and I just got a new credit card with the number 3714-496089-47322.
My personal email is sarah.mitchell@mailbox.com, and I'm currently based in Sydney.
By the way, I tried paying my utility bill with card number 6011-5832-9109-1726, but it didn't work.
For my bank transactions, I use this IBAN: NL91ABNA0417164300.
Also, can you help me with my Wi-Fi issues? I keep getting blocked by IP address 203.0.113.15.
I've shared a family photo on my personal blog at https://www.sarahs-lifediary.org/.
Oh, and my grandfather, George Stone, was born in 1921, while my grandmother, Emily Clarkson, was born in 1925.
Last question—what's the spending limit on my main card, the one ending in 8473?
"""
node = TextNode(text=text)
processor = PresidioPIINodePostprocessor()
new_nodes = processor.postprocess_nodes([NodeWithScore(node=node)])
print(new_nodes[0].node.get_text())
WARNING:presidio-analyzer:Model en_core_web_lg is not installed. Downloading...
✔ Download and installation successful You can now load the package via spacy.load('en_core_web_lg') ⚠ Restart to reload dependencies If you are in a Jupyter or Colab notebook, you may need to restart Python in order to load all the package's dependencies. You can do this by selecting the 'Restart kernel' or 'Restart runtime' option.
WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - CreditCardRecognizer supported languages: es, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - CreditCardRecognizer supported languages: it, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - CreditCardRecognizer supported languages: pl, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - EsNifRecognizer supported languages: es, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - EsNieRecognizer supported languages: es, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - ItDriverLicenseRecognizer supported languages: it, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - ItFiscalCodeRecognizer supported languages: it, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - ItVatCodeRecognizer supported languages: it, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - ItIdentityCardRecognizer supported languages: it, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - ItPassportRecognizer supported languages: it, registry supported languages: en WARNING:presidio-analyzer:Recognizer not added to registry because language is not supported by registry - PlPeselRecognizer supported languages: pl, registry supported languages: en
Hi, I'm <PERSON_3>, and I just got a new credit card with the number 3714-<US_DRIVER_LICENSE_1>-47322. My personal email is <EMAIL_ADDRESS_1>, and I'm currently based in <LOCATION_1>. By the way, I tried paying my utility bill with card number <IN_PAN_1>9109-1726, but it didn't work. For my bank transactions, I use this IBAN: <IBAN_CODE_1>. Also, can you help me with my Wi-Fi issues? I keep getting blocked by IP address <IP_ADDRESS_1>. I've shared a family photo on my personal blog at <URL_1> Oh, and my grandfather, <PERSON_2>, was born in <DATE_TIME_2>, while my grandmother, <PERSON_1>, was born in <DATE_TIME_1>. Last question—what's the spending limit on my main card, the one ending in 8473?
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.milvus import MilvusVectorStore
vector_store = MilvusVectorStore(
uri="./milvus_demo.db", dim=1536, overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex([n.node for n in new_nodes], storage_context=storage_context)
DEBUG:pymilvus.milvus_client.milvus_client:Created new connection using: ac940dacf6a2424d84da3b624ab662ec DEBUG:pymilvus.milvus_client.milvus_client:Successfully created collection: llamacollection DEBUG:pymilvus.milvus_client.milvus_client:Successfully created an index on collection: llamacollection
response = index.as_query_engine().query(
"What is the name of the person?"
)
print(str(response))
The name of the person is <PERSON_3>.
response = index.as_query_engine().query(
"What is the number of credit card number?"
)
print(str(response))
The credit card number is 3714-<US_DRIVER_LICENSE_1>-47322.
response = index.as_query_engine().query(
"What is the name of credit card number?"
)
print(str(response))
The name of the credit card number is Visa.