Assess predictions on Stanford Question Answering Dataset (SQuAD) with an openai endpoint¶

This notebook demonstrates the use of the responsibleai API to assess an openai endpoint on the SQuAD dataset (see https://huggingface.co/datasets/squad for more information about the dataset). It walks through the API calls necessary to create a widget with model analysis insights, then guides a visual analysis of the openai model.

Launch Responsible AI Toolbox
- Load Model and Data
- Create Model and Data Insights

Launch Responsible AI Toolbox¶

The following section examines the code necessary to create datasets and a model. It then generates insights using the responsibleai API that can be visually analyzed.

Load Model and Data¶

The following section can be skipped. It loads a dataset and trains a model for illustrative purposes.

First we import all necessary dependencies

In [ ]:

import datasets
import pandas as pd
from transformers import pipeline

from responsibleai_text import RAITextInsights, ModelTask

Next we load the SQuAD dataset from huggingface datasets

In [ ]:

dataset = datasets.load_dataset("squad", split="train")

In [ ]:

dataset

In [ ]:

def replace_error_chars(message):
    message = message.replace('`', '')
    return message

Reformat the dataset to be a pandas dataframe with three columns: context, questions and answers

In [ ]:

questions = []
context = []
answers = []
for row in dataset:
    context.append(row['context'])
    questions.append(row['question'])
    answers.append(replace_error_chars(row['answers']['text'][0]))

In [ ]:

data = pd.DataFrame({'context': context, 'questions': questions, 'answers': answers})

In [ ]:

data.iloc[:5]['answers'][0]

In [ ]:

from ml_wrappers.model import OpenaiWrapperModel

Create an openai endpoint wrapper through ml-wrappers library. Please enter the api_base and api_key for your openai endpoint below:

In [ ]:

api_type = "azure"
api_base = "https://your.openai.azure.com/"
api_version = "2023-03-15-preview"
api_key = "put_your_secret_key_here"

# Openai Wrapper that calls endpoint with given questions
openai_model = OpenaiWrapperModel(api_type, api_base, api_version, api_key, engine='gpt-4')

Here we modify the template for the request sent to openai by wrapping the OpenaiWrapperModel. Please remove the path below and only return the predictions when running on your openai endpoint.

In [ ]:

# modify the template for the questions passed to openai
import pandas as pd
import numpy as np
from unittest.mock import patch

class template(object):
    def __init__(self, model):
        self.model = model

    def predict(self, dataset):
        template = 'Answer the question given the context.'
        for i, (context, question) in enumerate(zip(dataset['context'], dataset['questions'])):
            templated_question = template + '\n\ncontext: ' + context + '\nquestion: ' + question
            if isinstance(dataset, pd.DataFrame):
                dataset.iloc[i]['questions'] = templated_question
            else:
                dataset['questions'] = templated_question
        # NOTE: Remove this patch when calling your openai model
        with patch('ml_wrappers.model.OpenaiWrapperModel.predict') as mock_predict:
            # wrap return value in mock class
            if isinstance(dataset, pd.DataFrame):
                mock_predict.return_value = np.array(data['answers'][dataset.index])
            else:
                mock_predict.return_value = np.array(data['answers'][0])
            context = {}
            # Note: When calling a real openai model just return this line here
            return self.model.predict(dataset)

In [ ]:

train_data = data
test_data = data[:5]

In [ ]:

pipeline = template(openai_model)
pipeline.predict(test_data)

Create Model and Data Insights¶

In [ ]:

from responsibleai_text import RAITextInsights, ModelTask
from raiwidgets import ResponsibleAIDashboard

To use Responsible AI Dashboard, initialize a RAITextInsights object upon which different components can be loaded.

RAITextInsights accepts the model, the test dataset, the classes and the task type as its arguments.

In [ ]:

rai_insights = RAITextInsights(pipeline, test_data,
                               "answers",
                               task_type=ModelTask.QUESTION_ANSWERING)

Add the components of the toolbox for model assessment.

In [ ]:

rai_insights.error_analysis.add()
# Note: explanations are not yet available for openai models
# rai_insights.explainer.add()

Once all the desired components have been loaded, compute insights on the test set.

In [ ]:

rai_insights.compute()

Finally, visualize and explore the model insights. Use the resulting widget or follow the link to view this in a new tab.

In [ ]:

ResponsibleAIDashboard(rai_insights)