Comparing the TreeSHAP and `Exact` explainers¶

We define the same dataset as exposed in gh-2345.

In [1]:

import numpy as np

X = np.vstack([
    [[0, 0]] * 400,
    [[0, 1]] * 100,
    [[1, 0]] * 100,
    [[1, 1]] * 400,
])
y = np.array(
    [0] * 400 +
    [50] * 100 +
    [50] * 100 +
    [100] * 400
)

Define two trees with different split ordering¶

We vary the random_state to make sure that

tree_1 considers X[0] as a root split
tree_2 considers X[1] as a root split

Note that the 2 trees compute the same prediction on any data points. They are two distinct implementations of the same mathematical decision function.

In [2]:

import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
 
_, ax = plt.subplots(figsize=(10, 6))
tree_1 = DecisionTreeRegressor(random_state=2)
tree_1.fit(X, y)
_ = plot_tree(tree_1, ax=ax)

In [3]:

_, ax = plt.subplots(figsize=(10, 6))
tree_2 = DecisionTreeRegressor(random_state=0)
tree_2.fit(X, y)
_ = plot_tree(tree_2, ax=ax)

Explaining the prediction for a given data point with TreeSHAP¶

In [4]:

X_test = np.array([[1, 1]])

In [5]:

from shap import Explainer

In [6]:

tree_1_explainer = Explainer(model=tree_1, algorithm="tree")
tree_1_explainer(X_test)

Out[6]:

.values =
array([[32.5, 17.5]])

.base_values =
array([[50.]])

.data =
array([[1, 1]])

In [7]:

tree_2_explainer = Explainer(model=tree_2, algorithm="tree")
tree_2_explainer(X_test)

Out[7]:

.values =
array([[17.5, 32.5]])

.base_values =
array([[50.]])

.data =
array([[1, 1]])

We observe that we can reproduce the bug regarding the asymmetry of SHAP values reported in the original issue linked above.

Explaining the prediction for a given data point with the `Exact` explainer¶

In [8]:

from shap.explainers import Exact
from shap.maskers import Independent

explainer = Exact(tree_1.predict, Independent(X, max_samples=X.shape[0]))
explainer(X_test)

(array([1, 1]),)

Out[8]:

.values =
array([[25., 25.]])

.base_values =
array([50.])

.data =
array([[1, 1]])

In [9]:

explainer = Exact(tree_2.predict, Independent(X, max_samples=X.shape[0]))
explainer(X_test)

(array([1, 1]),)

Out[9]:

.values =
array([[25., 25.]])

.base_values =
array([50.])

.data =
array([[1, 1]])

The Exact explainer is not subject to the asymmetry problem.

We can as well check that a Python implementation of the same decision function leads to the same explaination.

In [11]:

def my_predict_one(x):
    if x[0] < 0.5 and x[1] < 0.5:
        return 0
    elif x[0] > 0.5 and x[1] > 0.5:
        return 100
    else:
        return 50

    
def my_predict(X):
    return np.array(
        [my_predict_one(x) for x in X]
    )

explainer = Exact(my_predict, masker=Independent(X, max_samples=X.shape[0]))
explainer(X_test)

(array([1, 1]),)

Out[11]:

.values =
array([[25., 25.]])

.base_values =
array([50.])

.data =
array([[1, 1]])

Checking the ACV implementation of SHAP values¶

In [23]:

from acv_explainers import ACVTree

ACVTree(tree_1, X).shap_values(X_test)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1216.45it/s]

Out[23]:

array([[[25.],
        [25.]]])

In [19]:

ACVTree(tree_2, X).shap_values(X_test)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1452.82it/s]

Out[19]:

array([[[25.],
        [25.]]])

Checking the FastTreeSHAP implementation¶

We now observe that this bug is also present in the FastTreeSHAP implementation.

In [11]:

%pip install -q fasttreeshap

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
poetry 1.1.13 requires packaging<21.0,>=20.4, but you have packaging 21.3 which is incompatible.
Note: you may need to restart the kernel to use updated packages.

In [12]:

import fasttreeshap

fasttreeshap.TreeExplainer(tree_1, algorithm="v2")(X_test)

Out[12]:

.values =
array([[32.5, 17.5]])

.base_values =
array([[50.]])

.data =
array([[1, 1]])

In [13]:

fasttreeshap.TreeExplainer(tree_2, algorithm="v2")(X_test)

Out[13]:

.values =
array([[17.5, 32.5]])

.base_values =
array([[50.]])

.data =
array([[1, 1]])

Comparing the TreeSHAP and Exact explainers¶