Open book
and Closed book
question answering with Google's T5¶With the latest NLU release and Google's T5 you can answer general knowledge based questions given no context and in addition answer questions on text databases.
These questions can be asked in natural human language and answerd in just 1 line with NLU!.
open book question
?¶You can imagine an open book
question similar to an examen where you are allowed to bring in text documents or cheat sheets that help you answer questions in an examen. Kinda like bringing a history book to an history examen.
In T5's
terms, this means the model is given a question
and an additional piece of textual information or so called context
.
This enables the T5
model to answer questions on textual datasets like medical records
,newsarticles
, wiki-databases
, stories
and movie scripts
, product descriptions
, 'legal documents' and many more.
You can answer open book question
in 1 line of code, leveraging the latest NLU release and Google's T5.
All it takes is :
nlu.load('answer_question').predict("""
Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards,
and Jebe died on the road back to Samarkand""")
>>> Output: Samarkand
Example for answering medical questions based on medical context
question ='''
What does increased oxygen concentrations in the patient’s lungs displace?
context: Hyperbaric (high-pressure) medicine uses special oxygen chambers to increase the partial pressure of O 2 around the patient and, when needed, the medical staff.
Carbon monoxide poisoning, gas gangrene, and decompression sickness (the ’bends’) are sometimes treated using these devices. Increased O 2 concentration in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.
Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''
#Predict on text data with T5
nlu.load('answer_question').predict(question)
>>> Output: carbon monoxide
Take a look at this example on a recent news article snippet :
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'
# from https://www.bbc.com/news/business-55728338
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
# join question with context, works with Pandas DF aswell!
questions = [
question1+ news_article_snippet,
question2+ news_article_snippet,
question3+ news_article_snippet,
question4+ news_article_snippet,
question5+ news_article_snippet,
question6+ news_article_snippet,]
nlu.load('answer_question').predict(questions)
This will output a Pandas Dataframe similar to this :
Answer | Question |
---|---|
Alibaba Group founder| Who is Jack ma? |
|Jack Ma |Who is founder of Alibaba Group? |
Wednesday | When did Jack Ma re-appear? |
surged 5% | How did Alibaba stocks react? |
100 rural teachers | Whom did Jack Ma meet? |
Chinese regulators |Who did Jack Ma hide from?|
closed book question
?¶A closed book question
is the exact opposite of a open book question
. In an examen scenario, you are only allowed to use what you have memorized in your brain and nothing else.
In T5's
terms this means that T5 can only use it's stored weights to answer a question
and is given no aditional context.
T5
was pre-trained on the C4 dataset which contains petabytes of web crawling data collected over the last 8 years, including Wikipedia in every language.
This gives T5
the broad knowledge of the internet stored in it's weights to answer various closed book questions
You can answer closed book question
in 1 line of code, leveraging the latest NLU release and Google's T5.
You need to pass one string to NLU, which starts which a question
and is followed by a context:
tag and then the actual context contents.
All it takes is :
nlu.load('en.t5').predict('Who is president of Nigeria?')
>>> Muhammadu Buhari
nlu.load('en.t5').predict('What is the most spoken language in India?')
>>> Hindi
nlu.load('en.t5').predict('What is the capital of Germany?')
>>> Berlin
import os
! apt-get update -qq > /dev/null
# Install java
! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["PATH"] = os.environ["JAVA_HOME"] + "/bin:" + os.environ["PATH"]
! pip install nlu pyspark==2.4.7 > /dev/null
import nlu
t5_closed_book = nlu.load('en.t5')
google_t5_small_ssm_nq download started this may take some time. Approximate size to download 139 MB [OK!]
t5_closed_book.predict('What is the capital of Germany?')
document | T5 | |
---|---|---|
origin_index | ||
0 | What is the capital of Germany? | Berlin |
t5_closed_book.predict('Who is president of Nigeria?')
document | T5 | |
---|---|---|
origin_index | ||
0 | Who is president of Nigeria? | Muhammadu Buhari |
t5_closed_book.predict('What is the most spoken language in India?')
document | T5 | |
---|---|---|
origin_index | ||
0 | What is the most spoken language in India? | Hindi |
Your context must be prefixed with context:
t5_open_book = nlu.load('answer_question')
t5_base download started this may take some time. Approximate size to download 446 MB [OK!]
Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand
t5_open_book.predict("""Where did Jebe die?
context: Ghenkis Khan recalled Subtai back to Mongolia soon afterwards, and Jebe died on the road back to Samarkand""" )
document | T5 | |
---|---|---|
origin_index | ||
0 | Where did Jebe die? context: Ghenkis Khan reca... | Samarkand |
question1 = 'Who is Jack ma?'
question2 = 'Who is founder of Alibaba Group?'
question3 = 'When did Jack Ma re-appear?'
question4 = 'How did Alibaba stocks react?'
question5 = 'Whom did Jack Ma meet?'
question6 = 'Who did Jack Ma hide from?'
# from https://www.bbc.com/news/business-55728338
news_article_snippet = """ context:
Alibaba Group founder Jack Ma has made his first appearance since Chinese regulators cracked down on his business empire.
His absence had fuelled speculation over his whereabouts amid increasing official scrutiny of his businesses.
The billionaire met 100 rural teachers in China via a video meeting on Wednesday, according to local government media.
Alibaba shares surged 5% on Hong Kong's stock exchange on the news.
"""
questions = [
question1+ news_article_snippet,
question2+ news_article_snippet,
question3+ news_article_snippet,
question4+ news_article_snippet,
question5+ news_article_snippet,
question6+ news_article_snippet,]
t5_open_book.predict(questions)
document | T5 | |
---|---|---|
origin_index | ||
0 | Who is Jack ma? context: Alibaba Group founder... | Alibaba Group founder |
1 | Who is founder of Alibaba Group? context: Alib... | Jack Ma |
2 | When did Jack Ma re-appear? context: Alibaba G... | Wednesday |
3 | How did Alibaba stocks react? context: Alibaba... | surged 5% |
4 | Whom did Jack Ma meet? context: Alibaba Group ... | 100 rural teachers |
5 | Who did Jack Ma hide from? context: Alibaba Gr... | Chinese regulators |
# define Data, add additional context tag between sentence
question ='''
What does increased oxygen concentrations in the patient’s lungs displace?
context: Hyperbaric (high-pressure) medicine uses special oxygen
chambers to increase the partial pressure of O 2 around the patient and,
when needed, the medical staff. Carbon monoxide poisoning, gas gangrene,
and decompression sickness (the ’bends’) are sometimes treated using these devices.
Increased O 2 concentration in the lungs helps to displace carbon monoxide from the
heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing its partial pressure helps kill them. Decompression sickness occurs in divers who decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible is part of the treatment.
'''
#Predict on text data with T5
t5_open_book.predict(question)
document | T5 | |
---|---|---|
origin_index | ||
0 | What does increased oxygen concentrations in t... | carbon monoxide |
t5_sum = nlu.load('en.t5.base')
t5_base download started this may take some time. Approximate size to download 446 MB [OK!]
# Set the task on T5
t5_sum['t5'].setTask('summarize ')
# define Data, add additional tags between sentences
data = [
'''
The belgian duo took to the dance floor on monday night with some friends . manchester united face newcastle in the premier league on wednesday . red devils will be looking for just their second league away win in seven . louis van gaal’s side currently sit two points clear of liverpool in fourth .
''',
''' Calculus, originally called infinitesimal calculus or "the calculus of infinitesimals", is the mathematical study of continuous change, in the same way that geometry is the study of shape and algebra is the study of generalizations of arithmetic operations. It has two major branches, differential calculus and integral calculus; the former concerns instantaneous rates of change, and the slopes of curves, while integral calculus concerns accumulation of quantities, and areas under or between curves. These two branches are related to each other by the fundamental theorem of calculus, and they make use of the fundamental notions of convergence of infinite sequences and infinite series to a well-defined limit.[1] Infinitesimal calculus was developed independently in the late 17th century by Isaac Newton and Gottfried Wilhelm Leibniz.[2][3] Today, calculus has widespread uses in science, engineering, and economics.[4] In mathematics education, calculus denotes courses of elementary mathematical analysis, which are mainly devoted to the study of functions and limits. The word calculus (plural calculi) is a Latin word, meaning originally "small pebble" (this meaning is kept in medicine – see Calculus (medicine)). Because such pebbles were used for calculation, the meaning of the word has evolved and today usually means a method of computation. It is therefore used for naming specific methods of calculation and related theories, such as propositional calculus, Ricci calculus, calculus of variations, lambda calculus, and process calculus.'''
]
#Predict on text data with T5
document | T5 | |
---|---|---|
origin_index | ||
0 | The belgian duo took to the dance floor on mon... | manchester united face newcastle in the premie... |
1 | Calculus, originally called infinitesimal calc... | calculus, originally called infinitesimal calc... |
text = """(Reuters) - Mastercard Inc said on Wednesday it was planning to offer support for some cryptocurrencies on its network this year, joining a string of big-ticket firms that have pledged similar support.
The credit-card giant’s announcement comes days after Elon Musk’s Tesla Inc revealed it had purchased $1.5 billion of bitcoin and would soon accept it as a form of payment.
Asset manager BlackRock Inc and payments companies Square and PayPal have also recently backed cryptocurrencies.
Mastercard already offers customers cards that allow people to transact using their cryptocurrencies, although without going through its network.
"Doing this work will create a lot more possibilities for shoppers and merchants, allowing them to transact in an entirely new form of payment. This change may open merchants up to new customers who are already flocking to digital assets," Mastercard said. (mstr.cd/3tLaPZM)
Mastercard specified that not all cryptocurrencies will be supported on its network, adding that many of the hundreds of digital assets in circulation still need to tighten their compliance measures.
Many cryptocurrencies have struggled to win the trust of mainstream investors and the general public due to their speculative nature and potential for money laundering.
"""
short = t5_sum.predict(text)
short
document | T5 | |
---|---|---|
origin_index | ||
0 | (Reuters) - Mastercard Inc said on Wednesday i... | mastercard said on Wednesday it was planning t... |
short.T5.iloc[0]
'mastercard said on Wednesday it was planning to offer support for some cryptocurrencies on its network this year . the credit-card giant’s announcement comes days after Elon Musk’s Tesla Inc revealed it had purchased $1.5 billion of bitcoin . asset manager blackrock and payments companies Square and PayPal have also recently backed cryptocurrencies .'
len('mastercard said on Wednesday it was planning to offer support for some cryptocurrencies on its network this year . the credit-card giant’s announcement comes days after Elon Musk’s Tesla Inc revealed it had purchased $1.5 billion of bitcoin . asset manager blackrock and payments companies Square and PayPal have also recently backed cryptocurrencies .')
352
len(text)
1284