from collections import Counter
from pprint import pprint
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from ipywidgets import interact
import transformers
from transformers import pipeline
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results
import os
import gzip
import tqdm as tq
from tqdm.notebook import tqdm
tqdm.pandas()
import networkx as nx
import watermark
%load_ext watermark
%matplotlib inline
We start by printing out the versions of the libraries we're using for future reference
%watermark -n -v -m -g -iv
Python implementation: CPython Python version : 3.11.7 IPython version : 8.12.3 Compiler : Clang 14.0.6 OS : Darwin Release : 23.6.0 Machine : arm64 Processor : arm CPU cores : 16 Architecture: 64bit Git hash: b44900a26f10de8fbaf559b307e69185828b77b4 watermark : 2.4.3 numpy : 1.26.4 pandas : 2.2.3 transformers: 4.41.1 matplotlib : 3.8.0 tqdm : 4.66.4 networkx : 3.3 json : 2.0.9
Load default figure style
plt.style.use('d4sci.mplstyle')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
unmasker = pipeline('fill-mask', model='bert-base-uncased')
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight'] - This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
unmasker("Artificial Intelligence [MASK] take over the world.")
[{'score': 0.31824299693107605, 'token': 2064, 'token_str': 'can', 'sequence': 'artificial intelligence can take over the world.'}, {'score': 0.18299730122089386, 'token': 2097, 'token_str': 'will', 'sequence': 'artificial intelligence will take over the world.'}, {'score': 0.0560012087225914, 'token': 2000, 'token_str': 'to', 'sequence': 'artificial intelligence to take over the world.'}, {'score': 0.045194774866104126, 'token': 2015, 'token_str': '##s', 'sequence': 'artificial intelligences take over the world.'}, {'score': 0.045152731239795685, 'token': 2052, 'token_str': 'would', 'sequence': 'artificial intelligence would take over the world.'}]
?unmasker
unmasker("The man worked as a [MASK].")
[{'score': 0.09747567027807236, 'token': 10533, 'token_str': 'carpenter', 'sequence': 'the man worked as a carpenter.'}, {'score': 0.05238327011466026, 'token': 15610, 'token_str': 'waiter', 'sequence': 'the man worked as a waiter.'}, {'score': 0.04962737113237381, 'token': 13362, 'token_str': 'barber', 'sequence': 'the man worked as a barber.'}, {'score': 0.03788601979613304, 'token': 15893, 'token_str': 'mechanic', 'sequence': 'the man worked as a mechanic.'}, {'score': 0.037680596113204956, 'token': 18968, 'token_str': 'salesman', 'sequence': 'the man worked as a salesman.'}]
unmasker("The woman worked as a [MASK].")
[{'score': 0.21981653571128845, 'token': 6821, 'token_str': 'nurse', 'sequence': 'the woman worked as a nurse.'}, {'score': 0.1597415953874588, 'token': 13877, 'token_str': 'waitress', 'sequence': 'the woman worked as a waitress.'}, {'score': 0.11547262966632843, 'token': 10850, 'token_str': 'maid', 'sequence': 'the woman worked as a maid.'}, {'score': 0.03796852380037308, 'token': 19215, 'token_str': 'prostitute', 'sequence': 'the woman worked as a prostitute.'}, {'score': 0.030423782765865326, 'token': 5660, 'token_str': 'cook', 'sequence': 'the woman worked as a cook.'}]
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""
ner_tagger = pipeline("ner", aggregation_strategy="simple")
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english). Using a pipeline without specifying a model name and revision in production is not recommended. /opt/anaconda3/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
?ner_tagger
outputs = ner_tagger(text)
outputs
[{'entity_group': 'ORG', 'score': 0.8790102, 'word': 'Amazon', 'start': 5, 'end': 11}, {'entity_group': 'MISC', 'score': 0.9908588, 'word': 'Optimus Prime', 'start': 36, 'end': 49}, {'entity_group': 'LOC', 'score': 0.9997547, 'word': 'Germany', 'start': 90, 'end': 97}, {'entity_group': 'MISC', 'score': 0.5565716, 'word': 'Mega', 'start': 208, 'end': 212}, {'entity_group': 'PER', 'score': 0.59025526, 'word': '##tron', 'start': 212, 'end': 216}, {'entity_group': 'ORG', 'score': 0.66969275, 'word': 'Decept', 'start': 253, 'end': 259}, {'entity_group': 'MISC', 'score': 0.4983484, 'word': '##icons', 'start': 259, 'end': 264}, {'entity_group': 'MISC', 'score': 0.7753625, 'word': 'Megatron', 'start': 350, 'end': 358}, {'entity_group': 'MISC', 'score': 0.98785394, 'word': 'Optimus Prime', 'start': 367, 'end': 380}, {'entity_group': 'PER', 'score': 0.8120968, 'word': 'Bumblebee', 'start': 502, 'end': 511}]
pd.DataFrame(outputs)
entity_group | score | word | start | end | |
---|---|---|---|---|---|
0 | ORG | 0.879010 | Amazon | 5 | 11 |
1 | MISC | 0.990859 | Optimus Prime | 36 | 49 |
2 | LOC | 0.999755 | Germany | 90 | 97 |
3 | MISC | 0.556572 | Mega | 208 | 212 |
4 | PER | 0.590255 | ##tron | 212 | 216 |
5 | ORG | 0.669693 | Decept | 253 | 259 |
6 | MISC | 0.498348 | ##icons | 259 | 264 |
7 | MISC | 0.775362 | Megatron | 350 | 358 |
8 | MISC | 0.987854 | Optimus Prime | 367 | 380 |
9 | PER | 0.812097 | Bumblebee | 502 | 511 |
reader = pipeline("question-answering")
No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad). Using a pipeline without specifying a model name and revision in production is not recommended. /opt/anaconda3/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn(
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])
score | start | end | answer | |
---|---|---|---|---|
0 | 0.631292 | 335 | 358 | an exchange of Megatron |
translator = pipeline("translation_en_to_it",
model="Helsinki-NLP/opus-mt-en-it")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])
Cara Amazon, la scorsa settimana ho ordinato una figura d'azione Optimus Prime dal tuo negozio online in Germania. Purtroppo, quando ho aperto il pacchetto, ho scoperto al mio orrore che ero stato inviato una figura d'azione di Megatron invece! Come un nemico per tutta la vita dei Decepticon, spero che si può capire il mio dilemma. Per risolvere il problema, chiedo uno scambio di Megatron per la figura di Optimus Prime ho ordinato. In allegato sono copie dei miei record riguardanti questo acquisto. Mi aspetto di sentire da voi presto. Cordialmente, Bumblebee.
Caro Amazon, la settimana scorsa ho ordinato un action figure di Optimus Prime dal tuo negozio online in Germania. Sfortunatamente, quando ho aperto il pacco, ho scoperto con orrore che mi era stata invece inviata una action figure di Megatron! Essendo un nemico da sempre dei Decepticon, spero che tu possa capire il mio dilemma. Per risolvere il problema, chiedo uno scambio di Megatron con la figura di Optimus Prime che ho ordinato. In allegato sono presenti copie dei miei documenti relativi a questo acquisto. Mi aspetto di sentirti presto. Cordiali saluti, Bombo.
generator = pipeline("text-generation")
No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2). Using a pipeline without specifying a model name and revision in production is not recommended. /opt/anaconda3/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn(
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
print(prompt)
Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee. Customer service response: Dear Bumblebee, I am sorry to hear that your order was mixed up.
outputs = generator(prompt, max_length=1000)
print(outputs[0]['generated_text'])
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee. Customer service response: Dear Bumblebee, I am sorry to hear that your order was mixed up. This isn't a problem for me. My order simply came with an incorrect name. I ordered as "Megatron, Bumblebee". I was able to get back all the Transformers figures I already purchased from your online store. The instructions on the front of the package says this. All I can say is that I'm sure Megatron was not included with the Optimus Prime figure but just one (not a single) part. It's quite a shock when you know as you are one of us. Thank you. Bumblebee! I am hoping that the shipping will be less than $5.00. Please contact me on the comments section below. I'm sure that you will be getting the exact items I received. I am simply told that I received the correct order. I think that you are a bit mistaken! Unfortunately you are not receiving the packaging we have in our possession. Please contact me after you find out why this is.