This notebook demonstrates the use of the responsibleai
API to assess a text classification huggingface transformers model trained on the blbooksgenre dataset (see https://huggingface.co/datasets/blbooksgenre for more information about the dataset). It walks through the API calls necessary to create a widget with model analysis insights, then guides a visual analysis of the model.
The following section examines the code necessary to create datasets and a model. It then generates insights using the responsibleai
API that can be visually analyzed.
The following section can be skipped. It loads a dataset and trains a model for illustrative purposes.
First we import all necessary dependencies
import datasets
import pandas as pd
import zipfile
from sklearn.model_selection import train_test_split
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
pipeline)
from raiutils.common.retries import retry_function
try:
from urllib import urlretrieve
except ImportError:
from urllib.request import urlretrieve
Next we load the blbooksgenre dataset from huggingface datasets
NUM_TEST_SAMPLES = 20
def load_dataset(split):
config_kwargs = {"name": "title_genre_classifiction"}
dataset = datasets.load_dataset("blbooksgenre", split=split, **config_kwargs)
return pd.DataFrame({"text": dataset["title"], "label": dataset["label"]})
pd_data = load_dataset("train")
pd_data, pd_valid_data = train_test_split(
pd_data, test_size=0.2, random_state=0)
START_INDEX = 0
train_data = pd_data[NUM_TEST_SAMPLES:].reset_index(drop=True)
test_data = pd_valid_data[:NUM_TEST_SAMPLES].reset_index(drop=True)
Fetch a pre-trained huggingface model on the blbooksgenre dataset
BLBOOKSGENRE_MODEL_NAME = "blbooksgenre_model"
NUM_LABELS = 2
class FetchModel(object):
def __init__(self):
pass
def fetch(self):
zipfilename = BLBOOKSGENRE_MODEL_NAME + '.zip'
url = ('https://publictestdatasets.blob.core.windows.net/models/' +
BLBOOKSGENRE_MODEL_NAME + '.zip')
urlretrieve(url, zipfilename)
with zipfile.ZipFile(zipfilename, 'r') as unzip:
unzip.extractall(BLBOOKSGENRE_MODEL_NAME)
def retrieve_blbooksgenre_model():
fetcher = FetchModel()
action_name = "Model download"
err_msg = "Failed to download model"
max_retries = 4
retry_delay = 60
retry_function(fetcher.fetch, action_name, err_msg,
max_retries=max_retries,
retry_delay=retry_delay)
model = AutoModelForSequenceClassification.from_pretrained(
BLBOOKSGENRE_MODEL_NAME, num_labels=NUM_LABELS)
return model
model = retrieve_blbooksgenre_model()
Load the model and tokenizer
# load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
device = -1
if device >= 0:
model = model.cuda()
# build a pipeline object to do predictions
pred = pipeline(
"text-classification",
model=model,
tokenizer=tokenizer,
device=device,
return_all_scores=True
)
from ml_wrappers import wrap_model
wrapped_model = wrap_model(pred, test_data, 'text_classification')
print("number of errors on test dataset: " + str(sum(wrapped_model.predict(test_data['text'].tolist()) != test_data['label'].tolist())))
classes = train_data["label"].unique()
classes.sort()
from responsibleai_text import RAITextInsights, ModelTask
from raiwidgets import ResponsibleAIDashboard
To use Responsible AI Dashboard, initialize a RAITextInsights object upon which different components can be loaded.
RAITextInsights accepts the model, the test dataset, the classes and the task type as its arguments.
rai_insights = RAITextInsights(pred, test_data,
"label",
task_type=ModelTask.TEXT_CLASSIFICATION,
classes=classes)
Add the components of the toolbox for model assessment.
rai_insights.explainer.add()
rai_insights.error_analysis.add()
Once all the desired components have been loaded, compute insights on the test set.
rai_insights.compute()
Finally, visualize and explore the model insights. Use the resulting widget or follow the link to view this in a new tab.
ResponsibleAIDashboard(rai_insights)