This notebook shows how to use the cdQA
pipeline to perform question answering on a custom dataset.
*Note:* If you are using colab, you will need to install cdQA
by executing !pip install cdqa
in a cell.
import os
import pandas as pd
from ast import literal_eval
from cdqa.utils.filters import filter_paragraphs
from cdqa.pipeline import QAPipeline
/Users/andre.farias/python3.7.0/lib/python3.7/site-packages/tqdm/autonotebook/__init__.py:18: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console) " (e.g. in jupyter console)", TqdmExperimentalWarning)
from cdqa.utils.download import download_model, download_bnpp_data
download_bnpp_data(dir='./data/bnpp_newsroom_v1.1/')
download_model(model='bert-squad_1.1', dir='./models')
Downloading BNP data... Downloading trained model...
df = pd.read_csv('./data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})
df = filter_paragraphs(df)
df.head()
date | title | category | link | abstract | paragraphs | |
---|---|---|---|---|---|---|
0 | 13.05.2019 | The banking jobs : Assistant Vice President – ... | Careers | https://group.bnpparibas/en/news/banking-jobs-... | Within the Group’s Corporate and Institutional... | [I manage a team in charge of designing and im... |
1 | 13.05.2019 | BNP Paribas at #VivaTech : discover the progra... | Innovation | https://group.bnpparibas/en/news/bnp-paribas-v... | From Thursday 16 to Saturday 18 May 2019, join... | [With François Hollande, Chairman of French fo... |
2 | 13.05.2019 | "The bank with an IT budget of more than EUR6 ... | Group | https://group.bnpparibas/en/news/the-bank-budg... | Interview with Jean-Laurent Bonnafé, Director ... | [We did the groundwork between 2012 and 2016, ... |
3 | 10.05.2019 | BNP Paribas at #VivaTech : discover the progra... | Innovation | https://group.bnpparibas/en/news/bnp-paribas-v... | From Thursday 16 to Saturday 18 May 2019, join... | [As part of the ‘United Tech of Europe’ theme,... |
4 | 10.05.2019 | When Artificial Intelligence participates in r... | Careers | https://group.bnpparibas/en/news/artificial-in... | As the competition to attract talent intensifi... | [Online recruitment is already the norm. Accor... |
cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib')
cdqa_pipeline.fit_retriever(df=df)
QAPipeline(reader=BertQA(bert_model='bert-base-uncased', do_lower_case=True, fp16=False, gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=-1, loss_scale=0, max_answer_length=30, n_best_size=20, no_cuda=False, null_score_diff_threshold=0.0, num_train_epochs=2, output_dir=None, predict_batch_size=8, seed=42, server_ip='', server_port='', train_batch_size=12, verbose_logging=False, version_2_with_negative=False, warmup_proportion=0.1))
query = 'Since when does the Excellence Program of BNP Paribas exist?'
prediction = cdqa_pipeline.predict(query)
3it [00:00, 931.93it/s] The pre-trained model you are loading is an uncased model but you have set `do_lower_case` to False. We are setting `do_lower_case=True` for you but you may want to check this behavior.
print('query: {}'.format(query))
print('answer: {}'.format(prediction[0]))
print('title: {}'.format(prediction[1]))
print('paragraph: {}'.format(prediction[2]))
query: Since when does the Excellence Program of BNP Paribas exist? answer: January 2016 title: BNP Paribas’ commitment to universities and schools paragraph: Since January 2016, BNP Paribas has offered an Excellence Program targeting new Master’s level graduates (BAC+5) who show high potential. The aid program lasts 18 months and comprises three assignments of six months each. It serves as a strong career accelerator that enables participants to access high-level management positions at a faster rate. The program allows participants to discover the BNP Paribas Group and its various entities in France and abroad, build an internal and external network by working on different assignments and receive personalized assistance from a mentor and coaching firm at every step along the way.