Getting Started with tabular data!¶

This notebook shows how to get started with Quantus using tabular data. For this purpose, we use the classic Titanic tabular dataset (Frank E. Harrell Jr., Thomas Cason):

https://www.openml.org/d/40945

The model in this notebook is taken from "Getting started with Captum - Titanic Data Analysis" provided by Captum:

https://captum.ai/tutorials/Titanic_Basic_Interpret

In [1]:

from IPython.display import clear_output

In [2]:

!pip install quantus torch captum tensorflow-datasets pandas

clear_output()

In [2]:

import pathlib
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

import quantus
from captum.attr import IntegratedGradients

import torch
import torch.nn as nn

torch.manual_seed(27)

clear_output()

np.random.seed(27)

1) Preliminaries¶

1.1 Load datasets¶

We load the dataset using the tensorflow-datasets library. Alternatively, it can be downloaded directly from the OpenML website: https://www.openml.org/d/40945

In [3]:

# Load datasets
df = pd.read_csv("assets/titanic3.csv")
df = df[["age", "embarked", "fare", "parch", "pclass", "sex", "sibsp", "survived"]]
df["age"] = df["age"].fillna(df["age"].mean())
df["fare"] = df["fare"].fillna(df["fare"].mean())

In [4]:

# Data statistics
df.describe()

Out[4]:

	age	fare	parch	pclass	sibsp	survived
count	1309.000000	1309.000000	1309.000000	1309.000000	1309.000000	1309.000000
mean	29.881138	33.295479	0.385027	2.294882	0.498854	0.381971
std	12.883193	51.738879	0.865560	0.837836	1.041658	0.486055
min	0.170000	0.000000	0.000000	1.000000	0.000000	0.000000
25%	22.000000	7.895800	0.000000	2.000000	0.000000	0.000000
50%	29.881138	14.454200	0.000000	3.000000	0.000000	0.000000
75%	35.000000	31.275000	0.000000	3.000000	1.000000	1.000000
max	80.000000	512.329200	9.000000	3.000000	8.000000	1.000000

In [5]:

# One-hot encode categorical variables
df_enc = pd.get_dummies(df, columns=["embarked", "pclass", "sex"]).sample(frac=1)

In [6]:

# Pandas dataframes to numpy arrays
X = df_enc.drop(["survived"], axis=1).values
Y = df_enc["survived"].values

In [7]:

# Create train and test set
train_features, test_features, train_labels, test_labels = train_test_split(
    X, Y, test_size=0.3
)

1.2 Train a model¶

The model is based on "Getting started with Captum - Titanic Data Analysis" provided by Captum:

https://captum.ai/tutorials/Titanic_Basic_Interpret

In [8]:

class TitanicSimpleNNModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(12, 12)
        self.sigmoid1 = nn.Sigmoid()
        self.linear2 = nn.Linear(12, 8)
        self.sigmoid2 = nn.Sigmoid()
        self.linear3 = nn.Linear(8, 2)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        lin1_out = self.linear1(x)
        sigmoid_out1 = self.sigmoid1(lin1_out)
        sigmoid_out2 = self.sigmoid2(self.linear2(sigmoid_out1))
        return self.softmax(self.linear3(sigmoid_out2))

In [9]:

net = TitanicSimpleNNModel()

criterion = nn.CrossEntropyLoss()
num_epochs = 200

optimizer = torch.optim.Adam(net.parameters(), lr=0.1)
input_tensor = torch.from_numpy(train_features).type(torch.FloatTensor)
label_tensor = torch.from_numpy(train_labels)
for epoch in range(num_epochs):
    output = net(input_tensor)
    loss = criterion(output, label_tensor)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if epoch % 20 == 0:
        print("Epoch {}/{} => Loss: {:.2f}".format(epoch + 1, num_epochs, loss.item()))

Epoch 1/200 => Loss: 0.72
Epoch 21/200 => Loss: 0.57
Epoch 41/200 => Loss: 0.50
Epoch 61/200 => Loss: 0.48
Epoch 81/200 => Loss: 0.48
Epoch 101/200 => Loss: 0.47
Epoch 121/200 => Loss: 0.47
Epoch 141/200 => Loss: 0.49
Epoch 161/200 => Loss: 0.47
Epoch 181/200 => Loss: 0.47

In [10]:

out_probs = net(input_tensor).detach().numpy()
out_classes = np.argmax(out_probs, axis=1)
print("Train Accuracy:", sum(out_classes == train_labels) / len(train_labels))

Train Accuracy: 0.8384279475982532

In [11]:

test_input_tensor = torch.from_numpy(test_features).type(torch.FloatTensor)
out_probs = net(test_input_tensor).detach().numpy()
out_classes = np.argmax(out_probs, axis=1)
print("Test Accuracy:", sum(out_classes == test_labels) / len(test_labels))

Test Accuracy: 0.7684478371501272

1.3 Generate explanations¶

In this example, we rely on the captum library. We use the Integrated Gradients method.

In [12]:

ig = IntegratedGradients(net)

In [13]:

test_input_tensor.requires_grad_()
attr, delta = ig.attribute(test_input_tensor, target=1, return_convergence_delta=True)
attr = attr.detach().numpy()

2) Quantative evaluation using Quantus¶

We can evaluate our explanations on a variety of quantuative criteria but as a motivating example we test the ModelParameterRandomisation scores by Adebayo et al., 2018 and Complexity Bhatt et al., 2020.

The ModelParameterRandomisation metric measures the distance between the original attribution and a newly computed attribution throughout the process of cascadingly/independently randomizing the model parameters of one layer at a time.

The Complexity of attributions is defined as the entropy of the fractional contribution of feature x_i to the total magnitude of the attribution.

In [18]:

# Return ModelParameterRandomisation scores for Integrated Gradients.
scores_intgrad = quantus.ModelParameterRandomisation(
    similarity_func=quantus.similarity_func.correlation_spearman,
    return_sample_correlation=True,
    return_aggregate=True,
    aggregate_func=np.mean,
    layer_order="independent",
    disable_warnings=True,
    normalise=True,
    abs=True,
    display_progressbar=True,
)(
    model=net,
    x_batch=test_features,
    y_batch=test_labels,
    a_batch=None,
    explain_func=quantus.explain,
    explain_func_kwargs={"method": "IntegratedGradients", "reduce_axes": ()},
)
print(
    f"ModelParameterRandomisation scores by Adebayo et al., 2018\n"
    f"\n • Integrated Gradient = ",
    scores_intgrad,
)

  0%|          | 0/3 [00:00<?, ?it/s]

ModelParameterRandomisation scores by Adebayo et al., 2018

 • Integrated Gradient =  [0.9428662149762941]

In [21]:

complexity_intgrad = quantus.Complexity(
    normalise=True,
    abs=True,
    disable_warnings=True,
    display_progressbar=True,
    return_aggregate=True
)(
    model=net,
    x_batch=test_features,
    y_batch=test_labels,
    a_batch=None,
    explain_func=quantus.explain,
    explain_func_kwargs={"method": "IntegratedGradients", "reduce_axes": ()},
)

print(
    f"Complexity Bhatt et al., 2020.\n"
    f"\n • Integrated Gradient = ",
    complexity_intgrad,
)

Evaluating Complexity:   0%|          | 0/393 [00:00<?, ?it/s]

Complexity Bhatt et al., 2020.

 • Integrated Gradient =  [1.3059633285078698]

In [ ]: