Notebook [1]: First steps with cdQA

This notebook shows how to use the cdQA pipeline to perform question answering on a custom dataset.

Note: If you are using colab, you will need to install cdQA by executing !pip install cdqa in a cell.

In [1]:
import os
import pandas as pd
from ast import literal_eval

from cdqa.utils.filters import filter_paragraphs
from cdqa.pipeline import QAPipeline
/Users/andre.farias/python3.7.0/lib/python3.7/site-packages/tqdm/autonotebook/ TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
  " (e.g. in jupyter console)", TqdmExperimentalWarning)

Download pre-trained reader model and example dataset

In [2]:
from import download_model, download_bnpp_data

download_model(model='bert-squad_1.1', dir='./models')
Downloading BNP data...

Downloading trained model...

Visualize the dataset

In [3]:
df = pd.read_csv('./data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})
df = filter_paragraphs(df)
date title category link abstract paragraphs
0 13.05.2019 The banking jobs : Assistant Vice President – ... Careers Within the Group’s Corporate and Institutional... [I manage a team in charge of designing and im...
1 13.05.2019 BNP Paribas at #VivaTech : discover the progra... Innovation From Thursday 16 to Saturday 18 May 2019, join... [With François Hollande, Chairman of French fo...
2 13.05.2019 "The bank with an IT budget of more than EUR6 ... Group Interview with Jean-Laurent Bonnafé, Director ... [We did the groundwork between 2012 and 2016, ...
3 10.05.2019 BNP Paribas at #VivaTech : discover the progra... Innovation From Thursday 16 to Saturday 18 May 2019, join... [As part of the ‘United Tech of Europe’ theme,...
4 10.05.2019 When Artificial Intelligence participates in r... Careers As the competition to attract talent intensifi... [Online recruitment is already the norm. Accor...

Instantiate the cdQA pipeline from a pre-trained reader model

In [4]:
cdqa_pipeline = QAPipeline(reader='./models/bert_qa.joblib')
QAPipeline(reader=BertQA(bert_model='bert-base-uncased', do_lower_case=True,
                         fp16=False, gradient_accumulation_steps=1,
                         learning_rate=3e-05, local_rank=-1, loss_scale=0,
                         max_answer_length=30, n_best_size=20, no_cuda=False,
                         null_score_diff_threshold=0.0, num_train_epochs=2,
                         output_dir=None, predict_batch_size=8, seed=42,
                         server_ip='', server_port='', train_batch_size=12,
                         verbose_logging=False, version_2_with_negative=False,

Execute a query

In [5]:
query = 'Since when does the Excellence Program of BNP Paribas exist?'
prediction = cdqa_pipeline.predict(query)
3it [00:00, 931.93it/s]
The pre-trained model you are loading is an uncased model but you have set `do_lower_case` to False. We are setting `do_lower_case=True` for you but you may want to check this behavior.

Explore predictions

In [6]:
print('query: {}'.format(query))
print('answer: {}'.format(prediction[0]))
print('title: {}'.format(prediction[1]))
print('paragraph: {}'.format(prediction[2]))
query: Since when does the Excellence Program of BNP Paribas exist?
answer: January 2016
title: BNP Paribas’ commitment to universities and schools
paragraph: Since January 2016, BNP Paribas has offered an Excellence Program targeting new Master’s level graduates (BAC+5) who show high potential. The aid program lasts 18 months and comprises three assignments of six months each. It serves as a strong career accelerator that enables participants to access high-level management positions at a faster rate. The program allows participants to discover the BNP Paribas Group and its various entities in France and abroad, build an internal and external network by working on different assignments and receive personalized assistance from a mentor and coaching firm at every step along the way.