A Unified Approach to Interpreting Model Predictions¶

https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf

Additive feature attribution methods¶

Original prediction model $f$ and explanation model $g$¶

Note, $f$ is NOT the real function that generate data, e.g. it's a trained ML model.

with $\mathbf{x}'$ being the simplified features that can be mapped to the original features via

$$\mathbf{x} = h_\mathbf{x}(\mathbf{x}')$$

we try to ensure the property of the explanation model that

$$g(\mathbf{z}') \approx f(h_\mathbf{x}(\mathbf{z}'))$$

whenever $\mathbf{z}' \approx \mathbf{x}'$.

Additive feature attribution methods¶

$$ g(\mathbf{z}') = \phi_0 + \sum_{i=1}^M \phi_i z_i' $$

where

$g$ is the explanation model
$\mathbf{z}' \in \{0, 1\}^M$, i.e. a $M$-dimension binary vector, with $M$ being the number of simplified features.
$\phi_i \in \mathbb{R}$

		Note	model to explanation	simplified inputs
LIME			blackbox	interpretable inputs
DeepLIFT			DNN
Layer-wise relevance propagation			DNN
Classic Shapley Value Estimation	Shapley regression values	needs retrainng models for all subsets of features
	Shapley sampling values	applying sampling approximation to Shapely regression values
	Quantitative Input Influence	Another way of sampling approximation to Shapely regression values

Simple properties uniquely determine additive feature attributions¶

Property 1: Local accuracy (aka. local accuracy)¶

$$ f(\mathbf{x}) = g(\mathbf{x}') = \phi_0 + \sum_{i=1}^M \phi_i x_i' $$

which means the explanation model output should match the prediction model output when $\mathbf{z}' = \mathbf{x}'$, and hence $\mathbf{x} = h_\mathbf{x}(\mathbf{x}') = h_\mathbf{x}(\mathbf{z}')$.

Note,

$\phi_0 = \mathbb{E}[f(\mathbf{x})] = f_{\mathbf{x}}(\mathbf{0})$, i.e. model output when no features are provided, e.g. average prediction of the labels from the training set.

Property 2: Missingness¶

i.e. when $x_i' = 0$, then $\phi_i = 0$, i.e. a feature that's not included in the feature vector shouldn't have impact on the prediction

Property 3: Consistency¶

Let $y(\mathbf{z}') = f(h_\mathbf{x}(\mathbf{z}'))$, and $\mathbf{z}_{\backslash i}$ denote seting $z_i' = 0$ in the simplified binary feature vector. For two models $y_A$ and $y_B$, if

$$ y_A(\mathbf{z}') - y_A(\mathbf{z}'_{\backslash i}) \ge y_B(\mathbf{z}') - y_B(\mathbf{z}'_{\backslash i}) $$

which can be expanded to $f_A(h_\mathbf{x}(\mathbf{z}')) - f_A(h_\mathbf{x}(\mathbf{z}'_{\backslash i})) \ge f_B(h_\mathbf{x}(\mathbf{z}')) - f_B(h_\mathbf{x}(\mathbf{z}'_{\backslash i}))$

then, then corresponding impact for the $i$th feature in the two models should satisfy

$$ \phi_{A, i} \ge \phi_{B, i} $$

where the subscript $_A$ and $_B$ identifies which model this $\phi_i$ belongs to.

In words, consistency means that for two models, if the exclusion of a feature results in a larger reduction in the predicted value in model A than in model B, then this feature should have a bigger impact in model A than in model B, too.

Theorem 1¶

Only one possible explanation model that follows the additive feature attribution methods can satisfy all three properties

$$ \phi_i(f, \mathbf{x}) = \frac{1}{M} \sum_{\mathbf{z}' \subseteq \mathbf{x}'} \binom{M - 1}{|\mathbf{z}'|}^{-1} \Big [ f(h_\mathbf{x}(\mathbf{z}')) - f(h_\mathbf{x}(\mathbf{z}_{\backslash i}')) \Big ] $$

WRONG EQUATION ABOVE, NEEDS TO FIX: it's $f_x$ instead of $f$!</span.

Note,

when $x_i = 0$, $x_0' = z_0' = 0$, so $\phi_i(f, \mathbf{x})$ = 0.
when $x_i \neq 0$, $x_i' = 1$, $\mathbf{z}' \subseteq \mathbf{x}'$ represents all $\mathbf{z}'$ vectors where the non-zero entries are a subset of the non-zero entries in $\mathbf{x}'$ with $z_i' = 1$. This correspond to all subsets of non-zero features $\mathbf{x}$ always including feature $i$.

Note on symbols.

$M - 1$ because of exclusion of feature $i$.
$f(h_\mathbf{x}(\mathbf{z}'))$ doesn't depend on $i$.
$|\mathbf{z}'|$ means the number of non-zero elements minus 1 (as $z_i'$ is always equal to 1). The original paper is a bit unclear about this, e.g. if $|\mathbf{z}'| = M$, $\binom{M - 1}{M}$ would become undefined.

The equation can be interpreted as sum up of all marginal contributions brought by feature $i$ of all possible feature vectors with $i$th feature being $0$ scaled by $\frac{1}{M}$.

The above equation can also be written as

$$ \phi_i(f, \mathbf{x}) = \frac{1}{M} \sum_{R \in \mathcal{R}} \frac{1}{(M - 1)!} \Big[ f_\mathbf{x} \left(P_i^R \cup i \right) - f_\mathbf{x}\left(P_i^R \right)\Big] $$

or

$$ \phi_i(f, \mathbf{x}) = \frac{1}{M} \sum_{S \in \mathcal{F}} \binom{|\mathcal{F}| - 1}{|S|} ^{-1} \left(f_\mathbf{x} (S \cup i) - f_\mathbf{x}(S) \right) $$

where

$\mathcal{R}$ is the set of all feature ordering (TODO: needs to confirm if $\mathcal{R}$ includes feature $i$ or not)

SHAP (SHapley Additive exPlanation) Values¶

Kernal SHAP (Lienar LIME + Shapley values)
Deep SHAP (DeepLIFT + Shapley values)

Explainable AI for Trees: From Local Explanations to Global Understanding¶

https://arxiv.org/pdf/1905.04610.pdf

Algorithm 1 Estimating $\mathbb{E}[f(\mathbf{x})|\mathbf{x}_S]$¶

Complexity: $O(TLM2^M)$

Notations

$T$: number of trees
$D$: maximum depth of any tree
$L$: number of leaves
$M$: number of features
$\mathbf{v}$: vector of nodes, $v_j \in \mathcal{R} \cup \text{internal}$
$\mathbf{a}$: vector of indices represent the left child of each internal node
$\mathbf{b}$: vector of indices represent the right child of each internal node
$\mathbf{t}$: vector of thresholds for each internal node
$\mathbf{d}$: vector of indices of features used for splitting in each internal node. $d_j \in \text{feature set}$.
$\mathbf{r}$: vector of covers (i.e. how many data points in the training set fall in the corresponding sub-tree) of each node.

All vectors are of length $N$, the number of nodes in the tree.

Algorithm 2¶

Complexity: $O(TLD^2)$

$m$ is the path of unique features we have split on so far, and contains four attributes:
1. $d$, the feature index,
2. $z$, the fraction of “zero” paths (where this feature is not in the set S) that flow through this branch,
3. $o$, the fraction of “one” paths (where this feature is in the set S) that flow through this branch, and
4. $w$, which is used to hold the proportion of sets of a given cardinality that are present weighted by their Shapley weight
$p_z$, fraction of zeros that are going to extend the subsets.
$p_o$, fraction of ones that are going to textend the subsets
$p_i$, index of the feature used to make the last split.

hot child: the child followed by the tree when given the input $\mathbf{x}$.

In [ ]: