Analyze Errors and Explore Interpretability of Models¶

This notebook demonstrates how to use the Responsible AI Widget's Error Analysis dashboard to understand a model trained on the Breast Cancer dataset. The goal of this sample notebook is to classify breast cancer diagnosis with scikit-learn and explore model errors and explanations:

Train a LightGBM classification model using Scikit-learn
Run Interpret-Community's 'explain_model' globally and locally to generate model explanations.
Visualize model errors and global and local explanations with the Error Analysis visualization dashboard.

Install Required Packages¶

In [ ]:

# %pip install --upgrade interpret-community
# %pip install --upgrade raiwidgets

Explain¶

Run model explainer at training time¶

In [ ]:

from sklearn.datasets import load_breast_cancer
from lightgbm import LGBMClassifier

# Explainers:
# SHAP Tabular Explainer
from interpret.ext.blackbox import TabularExplainer

Load the breast cancer diagnosis data¶

In [ ]:

breast_cancer_data = load_breast_cancer()
classes = breast_cancer_data.target_names.tolist()

In [ ]:

# Split data into train and test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(breast_cancer_data.data, breast_cancer_data.target, test_size=0.2, random_state=0)

Train a LightGBM classification model, which you want to explain¶

In [ ]:

clf = LGBMClassifier(n_estimators=1)
model = clf.fit(X_train, y_train)

Load simple ErrorAnalysis view without explanations¶

In [ ]:

from raiwidgets import ErrorAnalysisDashboard
predictions = clf.predict(X_test)
features = breast_cancer_data.feature_names
ErrorAnalysisDashboard(dataset=X_test, true_y=y_test, features=features, pred_y=predictions)

Explain predictions on your local machine¶

In [ ]:

# 1. Using SHAP TabularExplainer
explainer = TabularExplainer(model, 
                             X_train, 
                             features=breast_cancer_data.feature_names, 
                             classes=classes)

Generate global explanations¶

Explain overall model predictions (global explanation)

In [ ]:

# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# X_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = explainer.explain_global(X_test)

In [ ]:

# Sorted SHAP values
print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))
# Corresponding feature names
print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))
# Feature ranks (based on original order of features)
print('global importance rank: {}'.format(global_explanation.global_importance_rank))

# Note: Do not run this cell if using PFIExplainer, it does not support per class explanations
# Per class feature names
print('ranked per class feature names: {}'.format(global_explanation.get_ranked_per_class_names()))
# Per class feature importance values
print('ranked per class feature values: {}'.format(global_explanation.get_ranked_per_class_values()))

In [ ]:

# Print out a dictionary that holds the sorted feature importance names and values
print('global importance rank: {}'.format(global_explanation.get_feature_importance_dict()))

Explain overall model predictions as a collection of local (instance-level) explanations¶

In [ ]:

# feature shap values for all features and all data points in the training data
print('local importance values: {}'.format(global_explanation.local_importance_values))

Generate local explanations¶

Explain local data points (individual instances)

In [ ]:

# You can pass a specific data point or a group of data points to the explain_local function

# E.g., Explain the first data point in the test set
instance_num = 1
local_explanation = explainer.explain_local(X_test[:instance_num])

In [ ]:

# Get the prediction for the first member of the test set and explain why model made that prediction
prediction_value = clf.predict(X_test)[instance_num]

sorted_local_importance_values = local_explanation.get_ranked_local_values()[prediction_value]
sorted_local_importance_names = local_explanation.get_ranked_local_names()[prediction_value]

print('local importance values: {}'.format(sorted_local_importance_values))
print('local importance names: {}'.format(sorted_local_importance_names))

Visualize¶

[Optional] Load the interpretability visualization dashboard¶

In [ ]:

from raiwidgets import ExplanationDashboard

In [ ]:

ExplanationDashboard(global_explanation, model, dataset=X_test, true_y=y_test)

Analyze model errors and explanations using Error Analysis dashboard¶

In [ ]:

ErrorAnalysisDashboard(global_explanation, model, dataset=X_test, true_y=y_test)

In [ ]: