Analyze Errors and Explore Interpretability of Models¶

This notebook demonstrates how to use the Responsible AI Widget's Error Analysis dashboard to understand a model trained on the superconductivity dataset.

For more information on the dataset, please see the UCI repository:

https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data

Paper citation:

Hamidieh, Kam, A data-driven statistical model for predicting the critical temperature of a superconductor, Computational Materials Science, Volume 154, November 2018, Pages 346-354, Web Link

The goal of this sample notebook is to predict the critical temperature with scikit-learn and explore model errors and explanations:

Train an SVM classification model using Scikit-learn
Run Interpret-Community's 'explain_model' globally and locally to generate model explanations.
Visualize model errors and global and local explanations with the Error Analysis visualization dashboard.

Install Required Packages¶

In [ ]:

# %pip install --upgrade interpret-community
# %pip install --upgrade raiwidgets

Import required packages¶

In [ ]:

from sklearn import svm
from sklearn.model_selection import train_test_split
import pandas as pd

from urllib.request import urlretrieve
import zipfile

# Imports for SHAP MimicExplainer with LightGBM surrogate model
from interpret.ext.blackbox import MimicExplainer
from interpret.ext.glassbox import LGBMExplainableModel
from interpret_community.common.constants import ModelTask

from raiwidgets import ErrorAnalysisDashboard

Load the dataset from UCI Repository: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data ¶

In [ ]:

outdirname = 'superconduct'
zipfilename = outdirname + '.zip'
urlretrieve('https://archive.ics.uci.edu/ml/machine-learning-databases/00464/superconduct.zip', zipfilename)
with zipfile.ZipFile(zipfilename, 'r') as unzip:
    unzip.extractall('.')
df = pd.read_csv(r'./train.csv')
y = df['critical_temp'].values
X = df.drop(columns='critical_temp')
feature_names = list(X.columns)
X = X.values

In [ ]:

# Split data into train and test
X, _, y, _ = train_test_split(X, y, test_size=0.5, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [ ]:

X_test_sample, _, y_test_sample, _ = train_test_split(X_test, y_test, test_size=0.7, random_state=0)

In [ ]:

print("X_train size: ")
print(X_train.shape)
print("X_test size: ")
print(X_test.shape)
print("X_test_sample size: ")
print(X_test_sample.shape)

Train a SVM classification model¶

In [ ]:

clf = svm.SVR(gamma=0.001, C=100., tol=0.1)
model = clf.fit(X_train, y_train)

In [ ]:

# Notice the model makes a fair amount of error
print("average abs error on test dataset: " + str(sum(abs(model.predict(X_test) - y_test))/y_test.shape[0]))

Load simple ErrorAnalysis view without explanations¶

In [ ]:

predictions = model.predict(X_test)
ErrorAnalysisDashboard(dataset=X_test, true_y=y_test, features=feature_names,
                       pred_y=predictions, model_task='regression',
                       max_depth=3)

Train a surrogate model to explain the original blackbox model¶

In [ ]:

# Train the LightGBM surrogate model using MimicExplaner
model_task = ModelTask.Regression
explainer = MimicExplainer(model, X_train, LGBMExplainableModel,
                           augment_data=True, max_num_of_augmentations=10,
                           features=feature_names, model_task=model_task)

Generate global explanations¶

Explain overall model predictions (global explanation)

In [ ]:

# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# X_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = explainer.explain_global(X_test_sample)

In [ ]:

# Print out a dictionary that holds the sorted feature importance names and values
print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))

Visualize¶

Analyze model errors and explanations using Error Analysis dashboard¶

In [ ]:

ErrorAnalysisDashboard(global_explanation, model, dataset=X_test,
                       true_y=y_test_sample, true_y_dataset=y_test,
                       model_task='regression', max_depth=3)

Analyze Errors and Explore Interpretability of Models¶

Install Required Packages¶

Import required packages¶

Load the dataset from UCI Repository: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data¶

Train a SVM classification model¶

Load simple ErrorAnalysis view without explanations¶

Train a surrogate model to explain the original blackbox model¶

Generate global explanations¶

Visualize¶

Analyze model errors and explanations using Error Analysis dashboard¶

Load the dataset from UCI Repository: https://archive.ics.uci.edu/ml/datasets/Superconductivty+Data ¶