Exact
explainers¶We define the same dataset as exposed in gh-2345.
import numpy as np
X = np.vstack([
[[0, 0]] * 400,
[[0, 1]] * 100,
[[1, 0]] * 100,
[[1, 1]] * 400,
])
y = np.array(
[0] * 400 +
[50] * 100 +
[50] * 100 +
[100] * 400
)
We vary the random_state
to make sure that
tree_1
considers X[0]
as a root splittree_2
considers X[1]
as a root splitNote that the 2 trees compute the same prediction on any data points. They are two distinct implementations of the same mathematical decision function.
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
_, ax = plt.subplots(figsize=(10, 6))
tree_1 = DecisionTreeRegressor(random_state=2)
tree_1.fit(X, y)
_ = plot_tree(tree_1, ax=ax)
_, ax = plt.subplots(figsize=(10, 6))
tree_2 = DecisionTreeRegressor(random_state=0)
tree_2.fit(X, y)
_ = plot_tree(tree_2, ax=ax)
X_test = np.array([[1, 1]])
from shap import Explainer
tree_1_explainer = Explainer(model=tree_1, algorithm="tree")
tree_1_explainer(X_test)
.values = array([[32.5, 17.5]]) .base_values = array([[50.]]) .data = array([[1, 1]])
tree_2_explainer = Explainer(model=tree_2, algorithm="tree")
tree_2_explainer(X_test)
.values = array([[17.5, 32.5]]) .base_values = array([[50.]]) .data = array([[1, 1]])
We observe that we can reproduce the bug regarding the asymmetry of SHAP values reported in the original issue linked above.
Exact
explainer¶from shap.explainers import Exact
from shap.maskers import Independent
explainer = Exact(tree_1.predict, Independent(X, max_samples=X.shape[0]))
explainer(X_test)
(array([1, 1]),)
.values = array([[25., 25.]]) .base_values = array([50.]) .data = array([[1, 1]])
explainer = Exact(tree_2.predict, Independent(X, max_samples=X.shape[0]))
explainer(X_test)
(array([1, 1]),)
.values = array([[25., 25.]]) .base_values = array([50.]) .data = array([[1, 1]])
The Exact
explainer is not subject to the asymmetry problem.
We can as well check that a Python implementation of the same decision function leads to the same explaination.
def my_predict_one(x):
if x[0] < 0.5 and x[1] < 0.5:
return 0
elif x[0] > 0.5 and x[1] > 0.5:
return 100
else:
return 50
def my_predict(X):
return np.array(
[my_predict_one(x) for x in X]
)
explainer = Exact(my_predict, masker=Independent(X, max_samples=X.shape[0]))
explainer(X_test)
(array([1, 1]),)
.values = array([[25., 25.]]) .base_values = array([50.]) .data = array([[1, 1]])
from acv_explainers import ACVTree
ACVTree(tree_1, X).shap_values(X_test)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1216.45it/s]
array([[[25.], [25.]]])
ACVTree(tree_2, X).shap_values(X_test)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1452.82it/s]
array([[[25.], [25.]]])
We now observe that this bug is also present in the FastTreeSHAP implementation.
%pip install -q fasttreeshap
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
poetry 1.1.13 requires packaging<21.0,>=20.4, but you have packaging 21.3 which is incompatible.
Note: you may need to restart the kernel to use updated packages.
import fasttreeshap
fasttreeshap.TreeExplainer(tree_1, algorithm="v2")(X_test)
.values = array([[32.5, 17.5]]) .base_values = array([[50.]]) .data = array([[1, 1]])
fasttreeshap.TreeExplainer(tree_2, algorithm="v2")(X_test)
.values = array([[17.5, 32.5]]) .base_values = array([[50.]]) .data = array([[1, 1]])