In this notebook, we'll
Let
One way of formulating the Shapley value is
So Shapley value can be interpreted as the average marginal contribution from player $i$ over all permutations.
Let $S$ be the set of players in $P[:d]$, and $S \cup \{i\}$ be the set of players in $P[:d + 1]$. So $v(S) = v(P[:d])$ and $v(S \cup \{i\}) = v(P[: d + 1])$ Note, $P$ is ordered while $S$ is not.
Notice that once $S$ is fixed, the ordering of the players in $S$ does not affect $v(S)$, and there are $s!$ ways of ordering them (aka. permutations). Similarly, the ordering of the players after $i$ does not matter either, and there are $(n - s - 1)!$ ways of ordering. Therefore, Shapley value can also be written in a more common form,
In this form, the Shapley value is interpreted as the weighted average of marginal contributions from $i$ over all possible subsets of players before $i$. The weight is calculated $\frac{s!(n - s - 1)!}{n!}$.
A third form of writing Shapley value is from directly transforming the factorials into combinatorial notation,
Note,
We used $t$ instead of $s$ in the 4th equality to avoid confusion because by definition $s$ is always equal to $|S|$. More specifically, the difference between the last two equalities is that in Equality 3, we pick $S$ first, and then calculate its size $s$, while in Equality 4, we determine the size of the $S$ to pick first and then pick $S$ of only that size.
Based on https://www.youtube.com/watch?v=9OFMRiAVH-w, Shapley values satisfy the following properties (axioms?).
Compared to the Supplemental material of A Unified Approach to Interpreting Model Predictions , the later also added an Efficiency: $\sum_{i \in N} \phi_i(N, v) = v(N) - v(\emptyset)$, which is quite straightforward, In the later, Dummy player is know as Null effects and Additivity is know is Linearity.
TODO: It'd be nice to learn the proofs
Based on the interpretable ML community(E.g. https://hughchen.github.io/its_blog/index.html#shapley_values):
Read Monotonic solutions of cooperative games for more related ideas.
It looks based on the supplemental material:
Shapley has four properties: efficiency, symmetry, null effects and linearity, then Young's 1985 paper shows that monotonicity impleis linearity and null effects, then "A Unified Approach to Interpreting Model Predictions" paper shows that monotonicity also implies symmetry, then the properties of Shapley value can be summarized just as
For some reason, the "A Unified Approach to Interpreting Model Predictions" kept the missingness property, which seems identical to null effects property. In the paper, it says the property is required to adapt the Shapley proofs to the class of additive feature attribution methods. So to further my understanding why it's kept, it's essential to go through all the proofs!!
More ref: Question about missingness (https://github.com/slundberg/shap/issues/175).
We use this implementation to reproduce the calculations at https://hughchen.github.io/its_blog/index.html#shapley_values for Preset D. We'll replace the names Ava, Ben, Cat with a, b, c for clarity.
from typing import Set
import shapley
get_ipython().magic("load_ext autoreload")
get_ipython().magic("autoreload 2")
def value_func(players: Set[shapley.Player]) -> float:
"""Value functions."""
values = {
(): 1,
("Ava",): 0,
("Ben",): 1,
("Cat",): 1,
("Ava", "Ben"): 1,
("Ava", "Cat"): 1,
("Ben", "Cat"): 2,
("Ava", "Ben", "Cat"): 2,
}
return values[tuple(sorted(players))]
coalition = {"Ava", "Ben", "Cat"}
ϕ_a = shapley.shapley(coalition, value_func, "Ava")
ϕ_a
-0.3333333333333333
ϕ_b = shapley.shapley(coalition, value_func, "Ben")
ϕ_b
0.6666666666666666
ϕ_c = shapley.shapley(coalition, value_func, "Cat")
ϕ_c
0.6666666666666666
np.testing.assert_allclose(ϕ_a, -1/3)
np.testing.assert_allclose(ϕ_b, 2/3)
np.testing.assert_allclose(ϕ_c, 2/3)
assert ϕ_a + ϕ_b + ϕ_c == value_func({"Ava", "Ben", "Cat"}) - value_func({})