Note, $f$ is NOT the real function that generate data, e.g. it's a trained ML model.
with $\mathbf{x}'$ being the simplified features that can be mapped to the original features via
$$\mathbf{x} = h_\mathbf{x}(\mathbf{x}')$$we try to ensure the property of the explanation model that
whenever $\mathbf{z}' \approx \mathbf{x}'$.
where
Note | model to explanation | simplified inputs | ||
---|---|---|---|---|
LIME | blackbox | interpretable inputs | ||
DeepLIFT | DNN | |||
Layer-wise relevance propagation | DNN | |||
Classic Shapley Value Estimation | Shapley regression values | needs retrainng models for all subsets of features | ||
Shapley sampling values | applying sampling approximation to Shapely regression values | |||
Quantitative Input Influence | Another way of sampling approximation to Shapely regression values |
which means the explanation model output should match the prediction model output when $\mathbf{z}' = \mathbf{x}'$, and hence $\mathbf{x} = h_\mathbf{x}(\mathbf{x}') = h_\mathbf{x}(\mathbf{z}')$.
Note,
i.e. when $x_i' = 0$, then $\phi_i = 0$, i.e. a feature that's not included in the feature vector shouldn't have impact on the prediction
Let $y(\mathbf{z}') = f(h_\mathbf{x}(\mathbf{z}'))$, and $\mathbf{z}_{\backslash i}$ denote seting $z_i' = 0$ in the simplified binary feature vector. For two models $y_A$ and $y_B$, if
which can be expanded to $f_A(h_\mathbf{x}(\mathbf{z}')) - f_A(h_\mathbf{x}(\mathbf{z}'_{\backslash i})) \ge f_B(h_\mathbf{x}(\mathbf{z}')) - f_B(h_\mathbf{x}(\mathbf{z}'_{\backslash i}))$
then, then corresponding impact for the $i$th feature in the two models should satisfy
where the subscript $_A$ and $_B$ identifies which model this $\phi_i$ belongs to.
In words, consistency means that for two models, if the exclusion of a feature results in a larger reduction in the predicted value in model A than in model B, then this feature should have a bigger impact in model A than in model B, too.
Only one possible explanation model that follows the additive feature attribution methods can satisfy all three properties
Note,
Note on symbols.
The equation can be interpreted as sum up of all marginal contributions brought by feature $i$ of all possible feature vectors with $i$th feature being $0$ scaled by $\frac{1}{M}$.
The above equation can also be written as
or
where
Complexity: $O(TLM2^M)$
Notations
$T$: number of trees
$D$: maximum depth of any tree
$L$: number of leaves
$M$: number of features
$\mathbf{v}$: vector of nodes, $v_j \in \mathcal{R} \cup \text{internal}$
$\mathbf{a}$: vector of indices represent the left child of each internal node
$\mathbf{b}$: vector of indices represent the right child of each internal node
$\mathbf{t}$: vector of thresholds for each internal node
$\mathbf{d}$: vector of indices of features used for splitting in each internal node. $d_j \in \text{feature set}$.
$\mathbf{r}$: vector of covers (i.e. how many data points in the training set fall in the corresponding sub-tree) of each node.
All vectors are of length $N$, the number of nodes in the tree.
Complexity: $O(TLD^2)$
$m$ is the path of unique features we have split on so far, and contains four attributes:
$p_z$, fraction of zeros that are going to extend the subsets.
$p_o$, fraction of ones that are going to textend the subsets
$p_i$, index of the feature used to make the last split.
hot child: the child followed by the tree when given the input $\mathbf{x}$.