In this notebook we experimentally review some representative scenarios of confounding. The concept of confounders is an important tool to understand data and give it a causal interpretation. Here, we will take our start from the fun and illustrative case study of the greedy casino designed in [Bareinboim2015]. We implement and study this scenario, and then move onto other famous and representative case studies, which we use to exemplify and expound important ideas related to confounders and causality.
Greedy casino: we start with the greedy casino, a scenario affected by (quasi-)hidden confounders. We perform observational and interventional studies, obtaining different results because of the confounders. Contrary to the possible expectation that exact interventional quantities are preferrable, this scenario shows that our specific concerns rule which quantity is to be preferred. Furthermore discrepancy between the observed and interventional quantities induces us to suspect unethical behaviour.
Ungreedy casino: we consider a second casino, in which no confounders are present. Observational and interventional studies produce again diverging results, which are not distinguishable from the greedy casino case. This hints to the fact that our discussion of ethical evaluation may not depend only on the existence of confounders, but it requires deeper considerations.
Effectiveness of a new medicine: we then move (by graphical analogy) to the standard case of evaluating the effectiveness of a new medicine. This is a standard scenario which may be found in classical causality texts like [Pearl2009]. Here, we deal with a hidden confounder. Again, we perform observational and interventional studies, which end up disagreeing because of the unobserved confounder. In this case, interventional quantities provide the measure of causal strenght (ATE) we are looking for.
Policymaking for a new medicine: we keep playing with the new medicine scenario, wondering whether the interventional studies we have made are sufficient to justify the adoption of a determined policy. The realization that the simple knowledge of the causal effect of a medicine is not enough to find an optimal policy leads us to consider how other quantities (CATE, counterfactuals) allow us to work around confounders.
Simpson's paradox: effectiveness of a new medicine: we trace back to our question about the effectiveness of a new medicine, and we provide a new interpretation for the discrepancy between observational and interventional results. In particular, we show how wrong causal statements may be explained by a misspecified causal model; inconsistencies may arise among interventional quantities estimated via do-calculus on a model that misspecifies or ignores confounders.
Simpson's paradox: university admission: we finish with a classical instance of Simpson's paradox. Here, we deal with an observable confounder. Apparent inconsistencies arise even among observational data as a consequence of dealing improperly with confounders and causal interpretations.
We conclude with a few appendices where we provide a few variations of the scenarios that we examined.
This notebooks assumes basic knowledge of causality and structural causal models (SCMs) [Pearl2009].
Let us start by importing a couple of generic libraries.
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from tqdm import tqdm
We then define the number of simulations in our experiments.
n_trials = 10000
Finally, we set a random seed for reproducibility.
np.random.seed(1985)
Let us re-implement and explore the model of the greedy casino presented in [Bareinboim2015], and let's play around with it.
Welcome to the Greedy Casino! [Bareinboim2015]
In the greedy casino, players interact with sneaky slot machines that set their reward probability according to the state of the players. In the scenario we assume that it is well known that the choices of players are affected by: (a) their level of drunkedness, and (b) whether a slot machine is blinking or not. The sneaky machines in the greedy casino are equipped with cameras and face detection software, and they will set their reward probability as a function of the drunkedness of the player and level of blinking of the machine itself.
Formally, we set up a 2-armed bandits with unobserved confounders: a machine (bandit) has two levers (2-armed); the choice ($X$) of the player and the outcome ($Y$) are affected by drunkedness and blinking (unobserved confounders).
The problem we want to consider here is whether the reward levels of the machines are within the parameters of the law, that is, whether the percentage of victories is sufficently high. We will take $0.3$ as the required threshold, meaning that at least 3 times out of 10 the machines are expected to reward the players.
This investigation will lead us to discover a discrepancy between the expected reward in an observational setting and the expected reward in an interventional setting. We will explain this discrepancy in terms of different underlying structural causal models.
Let us take a computational approach and build a simulation of our casino. We use the concrete model presented in [Bareinboim2015].
We start with two functions, one randomly defining the state of a player (0: sober, 1: drunk) and one randomly defining the state of a machine (0: subdued, 1: blinking).
def random_select_blinking():
return np.random.binomial(1,p=.5)
def random_select_drunkedness():
return np.random.binomial(1,p=.5)
We then introduce a sneaky function that determines the reward level of the machine exploiting knowledge of the drunkedness of the player and blinking of the machine. Notice that we return an array: since every machine has two levers (2-arms) the first value is the winning probability when pulling the first arm, and the second is the winning probability when pulling the second arm.
def sneaky_select_reward(blinking,drunkedness):
if drunkedness==0:
if blinking==0:
return np.array([.1,.5])
else:
return np.array([.5,.1])
else:
if blinking==0:
return np.array([.4,.2])
else:
return np.array([.2,.4])
Finally, we define the play function which returns the outcome of playing (0: loss, 1: victory) taking into consideration the reward probability and the chosen lever.
def play(lever,reward):
return np.random.binomial(1,reward[int(lever)])
In order to get an intuition on whether the casino complies with the law, we run an observational study. We track players and record their results.
As the managers of the casino have understood, the choice of which lever to play is not completely random. The natural choice of a player depends on her level of drunkedness and the blinking of the machine. We encode this function next (0: first lever, 1: second lever):
def observational_select_lever(blinking,drunkedness):
return np.logical_xor(blinking,drunkedness)
We define a function representing a set of observations (generating the player and the machine, setting the reward, choosing which arm to pull, playing) and we collect the outcomes.
def run_observational_simulation(n_trials=n_trials):
results = {'lever': np.zeros(n_trials),
'outcome': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
blinking = random_select_blinking()
drunkedness = random_select_drunkedness()
reward = sneaky_select_reward(blinking,drunkedness)
lever = observational_select_lever(blinking,drunkedness)
outcome = play(lever,reward)
results['lever'][i] = lever
results['outcome'][i] = outcome
return results
As good statisticians, we collect many samples.
obs_res = run_observational_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 178230.74it/s]
We look then at the probability of victory $P(Y=1)$, where $Y$ is the outcome.
print('Probability of victory: {0}'.format(np.sum(obs_res['outcome']==1) / len(obs_res['outcome'])))
Probability of victory: 0.1442
The probability is very low, way under the legal threshold of $0.3$, suggesting that the casino is doing something fishy.
This is confirmed also by conditioning on the chosen lever. Let us compute $P(Y=1 \vert X=0)$ and $P(Y=1 \vert X=1)$ where $X$ is the chosen lever.
n_victory_lever0 = np.sum(obs_res['outcome'][obs_res['lever']==0])
n_lever0 = np.sum(obs_res['lever']==0)
print('Probability of victory given lever 0: {0}'.format(n_victory_lever0 / n_lever0))
Probability of victory given lever 0: 0.147192118226601
n_victory_lever1 = np.sum(obs_res['outcome'][obs_res['lever']==1])
n_lever1 = np.sum(obs_res['lever']==1)
print('Probability of victory given lever 1: {0}'.format(n_victory_lever1 / n_lever1))
Probability of victory given lever 1: 0.14111675126903553
Both results confirm that something illegal may be going on.
Let us stop for a moment and try to provide a graphical representation of our model.
Computational graph. We will first draw out an intuitive computational graph of our scenario.
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2', 'r',]
edges = [('Z1','X'), ('Z2','X'), ('Z1','r'), ('Z2','r'), ('X','Y'),('r','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(1, 1), 'X':(-1, -1), 'r':(1, -1), 'Y':(0, -3)}
nodes_lbl = {'Z1':'Drunk', 'Z2':'Blink', 'X':'Lever', 'r':'Reward', 'Y':'Victory'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
We have here a 1-to-1 correspondance between nodes and functions:
Notice how the arrows capture the arguments of the functions.
SCM graph. We can actually turn the above intuitive computational graph into a rigorous structural causal model (SCM) [Pearl2009] defined by a set of exogenous variables $\mathcal{E}$, a set of endogenous variables $\mathcal{X}$, a set of structural functions $\mathcal{F}$ associated with the endogenous variables, and a set of probability distribution $\mathcal{P}$ associated with the exogenous variables.
Let us redefine the nodes of our graph as follows:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2', 'r',]
edges = [('Z1','X'), ('Z2','X'), ('Z1','r'), ('Z2','r'), ('X','Y'),('r','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(1, 1), 'X':(-1, -1), 'r':(1, -1), 'Y':(0, -3)}
nodes_lbl = {'Z1':'D', 'Z2':'B', 'X':'L', 'r':'R', 'Y':'V'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
where the set of endogenous variables is: $$ \mathcal{X} = \{ D,B,L,R,V \}, $$ with obvious meanings, the set of exogenous variables (not depicted) is: $$ \mathcal{E} = \{ U_D,U_B,U_L,U_R,U_V \}, $$ the set of probability distributions $\mathcal{P}$ is made up by: \begin{align*} U_{D} & \sim\mathtt{Bern}(.5)\\ U_{B} & \sim\mathtt{Bern}(.5)\\ U_{R} & \sim\mathtt{1}\\ U_{L} & \sim\mathtt{1}\\ U_{V} & \sim\mathtt{Unif}(0,1) \end{align*} where $\mathtt{Bern}(p)$ is a Bernoulli distribution with probability $p$, $\mathtt{Unif}(a,b)$ is a continuous uniform distribution between $a$ and $b$, and $\mathtt{1}$ is just an identity, and the set of structural functions $\mathcal{F}$ is defined by: \begin{align*} D & =f_{D}(U_{D})=U_{D}\\ B & =f_{B}(U_{B})=U_{B}\\ R & =f_{R}(D,B,U_{R})=\mathtt{if...}\\ L & =f_{L}(D,B,U_{L})=D\oplus B\\ V & =f_{V}(T,R,U_{V})=\mathtt{if...} \end{align*} where $\mathtt{if...}$ is a shorthand for the if-statement encoded in the respective functions.
Simplified SCM graph. Notice that the node $R$ defining the reward probability of each level has no stochastic component (indeed $U_R \sim \mathtt{1}$) and just performs a deterministic computation. Instead of performing this computation in the node $R$, we can forward the inputs and integrate the computation directly in $V$, obtaining the smaller model:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2']
edges = [('Z1','X'), ('Z2','X'), ('Z1','Y'), ('Z2','Y'), ('X','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(1, 1), 'X':(-1, -1), 'Y':(0, -3)}
nodes_lbl = {'Z1':'D', 'Z2':'B', 'X':'L', 'Y':'V'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
This model has a very familiar shape; using standard causality notation and lexicon we can identify a treatment node ($X$), an outcome node ($Y$), and two confounders ($Z_1, Z_2$):
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2']
edges = [('Z1','X'), ('Z2','X'), ('Z1','Y'), ('Z2','Y'), ('X','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(0, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z1':'Z1', 'Z2':'Z2', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
A couple of observations:
The reasoning we have used to eliminate the node $R$ could be used to eliminate the node $L$, as well. The choice of the lever is purely deterministic. The reason we do not want to eliminate $L$ is that we will intervene on this node later.
The presence of two confounders $Z_1,Z_2$ is not necessary. A 'greedy casino' may be set up with a single confounder (see one-confounder example in the Appendix).
Let us go back to our scenario. Tipped by our recordings, public officers are now coming over to validate our complaint against the casino. Following statistical good practices they run an experimental study. They pick a customer and they assign her a random lever to pull.
We represent this scenario with a function in which the chosen lever is not the result of a natural choice encoded in a function, but a random toss of a coin.
def interventional_select_lever():
return np.random.binomial(1,p=.5)
We define a function representing a set of experiments (generating the player and the machine, setting the reward, randomly choosing which arm to pull, playing) and we collect the outcomes.
def run_interventional_simulation(n_trials=n_trials):
results = {'lever': np.zeros(n_trials),
'outcome': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
blinking = random_select_blinking()
drunkedness = random_select_drunkedness()
reward = sneaky_select_reward(blinking,drunkedness)
lever = interventional_select_lever()
outcome = play(lever,reward)
results['lever'][i] = lever
results['outcome'][i] = outcome
return results
As before, we collect many samples.
int_res = run_interventional_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 163073.68it/s]
Looking at the probability of victory, it seems the casino is doing an okay job, having a winning percentage close within the threshold required by law (up to sampling error).
print('Probability of victory: {0}'.format(np.sum(int_res['outcome']==1) / len(int_res['outcome'])))
Probability of victory: 0.303
Beware that the estimated quantity is not $P(Y=1)$ but $P_{do(X=x)}(Y=1) = P(Y=1 \vert do(X=x)) $, that is, the probability of victory in the model in which we intervened on the choice of the lever.
A similar conclusion follows by computing probabilities for individual levers:
n_victory_lever0 = np.sum(int_res['outcome'][int_res['lever']==0])
n_lever0 = np.sum(int_res['lever']==0)
print('Probability of victory acting on lever 0: {0}'.format(n_victory_lever0 / n_lever0))
Probability of victory acting on lever 0: 0.3046623794212219
n_victory_lever1 = np.sum(int_res['outcome'][int_res['lever']==1])
n_lever1 = np.sum(int_res['lever']==1)
print('Probability of victory acting on lever 1: {0}'.format(n_victory_lever1 / n_lever1))
Probability of victory acting on lever 1: 0.3013535031847134
Again, beware that these quantities are not the observational conditionals $P(Y=1 \vert X=0)$ and $P(Y=1 \vert X=1)$, but the interventionals $P(Y=1 \vert do(X=0))$ and $P(Y=1 \vert do(X=1))$
Let us try to understand what the officials have done, and why their results substantially differ from the results we colelcted.
Computational graph. Let us first compare the officials' computational graph with ours.
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2', 'r',]
edges = [('Z1','r'), ('Z2','r'), ('X','Y'),('r','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(1, 1), 'X':(-1, -1), 'r':(1, -1), 'Y':(0, -3)}
nodes_lbl = {'Z1':'Drunk', 'Z2':'Blink', 'X':'Lever', 'r':'Reward', 'Y':'Victory'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
The graph is the same as before with the exception of one node/function:
Since the function associated with lever has no arguments, the node has no incoming arrows.
SCM graph. Let us translate the computational graph into a SCM:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2', 'r',]
edges = [('Z1','r'), ('Z2','r'), ('X','Y'),('r','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(1, 1), 'X':(-1, -1), 'r':(1, -1), 'Y':(0, -3)}
nodes_lbl = {'Z1':'D', 'Z2':'B', 'X':'L', 'r':'R', 'Y':'V'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
where the set of endogenous variables $\mathcal{X}$ and exogenous variables $\mathcal{E}$ are the same, the set of probability distributions $\mathcal{P}$ is identical with the exception of: \begin{align*} U_{L} & \sim\mathtt{Bern}(.5),\\ \end{align*} and similarly the set of structural functions $\mathcal{F}$ is the same except for: \begin{align*} L & =f_{L}(D,B,U_{L})=U_{L}.\\ \end{align*} This represents what we would normally refer to as the post-interventional model in which the incoming arrows on the intervened nodes have been removed.
Simplified SCM graph. Simplifying the model as we did before, we get:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2']
edges = [('Z1','Y'), ('Z2','Y'), ('X','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(1, 1), 'X':(-1, -1), 'Y':(0, -3)}
nodes_lbl = {'Z1':'D', 'Z2':'B', 'X':'L', 'Y':'V'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
and in standard causality notation:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z1', 'Z2']
edges = [('Z1','Y'), ('Z2','Y'), ('X','Y')]
nodes_pos = {'Z1':(-1, 1), 'Z2':(0, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z1':'Z1', 'Z2':'Z2', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Through our graphical modelling we can explain the discrepancy between our results and the results collected by the state officials. Notice that this discrepancy can be maximized by the casino owner as shown in the maximally greedy casino example in the Appendix.
What is the meaning of the observational results? Our results were collected in an observational setting, that is a scenario in which treatment (lever) and outcome (victory) were confounded.
What we observed was $P(Y=1 \vert X=x)$, that is, what is the probability for a customer to go out of the casino with a reward in the natural setting, without us doing anything.
What is the meaning of the interventional results? The results collected by the state officials came from an interventional setting; by requiring the players to choose a lever completely at random they broke the connection between the confounders (drunkedness, blinking) and treatment (lever). They then collected results form a world in which treatment (lever) and outcome (victory) are not confounded anymore.
What they observed was $P(Y=1 \vert do(X=x))$, that is, probability for a customer to go out of the casino with a reward if a certain choice is taken, or imposed, independently from any other factor.
What is the point of performing an interventional study? Notice that the interventional study makes perfect sense if the aim is to characterize the causal mechanism of the outcome $Y$ as a function of $X$. The state officials do indeed observe the true distribution of the reward as a simple function of the chosen lever: the average reward, for any choice of lever, is $0.3$, as we encoded in our program. The state officials recovered that value. So the actual causal dynamics between treatment and outcome is reconstructed. What is missed is the fact that, because of external factors, the distribution of choice of the lever has a particular distribution; and knowledge of the factors that affect this distribution are exploited by the casino managers to trick their customers.
Indeed, from another perspective, we can explain the interventional study as a way to make the distribution for each $X=x$ identical: we do not want our assessment of the causal link $X\rightarrow Y$ to be affected by a different distribution at different levels $X=x$. By making the population for each $X=x$ identical, we retrieve the causal effect, but we lose the fact that, because of such a difference, the proportion of players leaving the casino winning is lower than $0.3$.
The above scenario may suggest that the discrepancy behind the observational quantity and the interventional quantity implies in itself that something unethical is being done by the casino managers. However, the judgement of fairness does not necessarily proceeds from the numbers that we collected. Let's explore an alternative scenario. We will see that our conclusions have to be based on a graphical model, and, therefore, on our assumption on the role of confounders.
After our unhappy experience at the greedy casino, we decide to move on to the nearby assuredly ungreedy casino. In this new gaming hall, it is guaranteed that rewards are selected randomly independently from any invasive analysis of the traits of the clients or any other environmental signal. However, it happens that drunk players normally opt for the unlucky lever; and it also happens that drunk players are more numerous that sober one.
Formally, we still have a 2-armed bandits problem: a machine (bandit) with two levers (2-armed); the choice ($X$) of the player is affected by drunkedness ($Z$); the outcome ($Y$) is affected only by the chosen lever. We can represent the scenario with the following graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','X'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Again, we want to evaluate if the ungreedy casino complies with the legal prescription of having a winning percentage of at least $0.3$.
We implement this scenario following the same structure we used for the greedy casino.
def random_select_drunkedness():
return np.random.binomial(1,p=.1)
def random_select_reward():
return np.array([.2,.4])
def observational_select_lever(drunkedness):
return drunkedness
def interventional_select_lever():
return np.random.binomial(1,p=.5)
def play(lever,reward):
return np.random.binomial(1,reward[int(lever)])
We have similarly named functions, but notice how arguments changed; in particular the select_reward() function (which has been aptly re-characterized as random instead of sneaky) receives no input, consistently with it having no incoming arrow in the graph above.
We can now perform an observational study as before:
def run_observational_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
drunkedness = random_select_drunkedness()
reward = random_select_reward()
lever = observational_select_lever(drunkedness)
victory = play(lever,reward)
results['x'][i] = lever
results['z'][i] = drunkedness
results['y'][i] = victory
return results
res = run_observational_simulation()
print('Probability of victory: {0}'.format(np.sum(res['y']==1) / len(res['y'])))
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 271303.90it/s]
Probability of victory: 0.2155
Similarly, we can call over state officials to perform an interventional study:
def run_interventional_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
drunkedness = random_select_drunkedness()
reward = random_select_reward()
lever = interventional_select_lever()
victory = play(lever,reward)
results['x'][i] = lever
results['z'][i] = drunkedness
results['y'][i] = victory
return results
res = run_interventional_simulation()
print('Probability of victory: {0}'.format(np.sum(res['y']==1) / len(res['y'])))
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 205170.67it/s]
Probability of victory: 0.2923
The results we obtained look very similar to the one we observed at the greedy casino: there is a gap between observational and interventional quantities that casts doubts on the casino, albeit notoriously ungreedy.
Let us try to compare and understand the results at the greedy and ungreedy casino.
Is the greedy casino unfair? How about the ungreedy casino? Just looking at observational and interventional quantities seems impossible to distinguish between the greedy and the ungreedy casino, and thus cast a judgment about their fairness. An evaluation of fairness requires to understand how the discrepancy we observed is generated: in the greedy casino there is a sneaky function that takes (potentially illegal) advantage of side information, while in the ungreedy casino there is a purely random function that happens to combine positively (for the casino owner) with the state of the customers. Of course, it would be up to the owner of the ungreedy casino to prove that the random function is actually random and not intentionally set up to sneakily exploit previous knowledge about the distribution of customers.
Such a fairness judgement, which relies on the assumption that the reward levels are independent from the behaviour and distribution of drunkedness, requires knowledge of the graph (or more refined interventional studies that may assess the existence and strength of the arrow $Z \rightarrow Y$).
Let us now consider a very classic and basic confounding scenario.
We have seen above that the (observational) greedy casino scenario, stripped to its bare bones and reduced to a single confounder, may be represented by the following graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','X'), ('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
where $X$ is the chosen lever, $Z$ is the level of drunkedness and blinking, and $Y$ is the outcome. We have also seen that observational and interventional studies provide different results, and we have provided an interpretation for their meaning.
The above graph is also a very common model for the study of the causal relationship of a treatment ($X$) on an outcome ($Y$). In the medical and pharmacological field, we would read $X$ as the provision of a medicine, $Z$ as a set of external factors affecting treatment and outcome (e.g.: socio-economic status), and $Y$ as the possible recovery. From the causal literature we also know that performing an interventional study is a golden standard to assess the effectiveness of a medicine.
We will treat this problem formally as an instance of a 2-arms bandit in which there are only two alternatives (2-arms) with respect to the medicine $X$.
Let us take again a computational approach and build a simulation of our medical scenario. For simplicity we will consider a binary medicine treatment $X$ (0: medicine A, 1: medicine B), a binary socio-economic status $Z$ (0: low, 1: high) and a binary outcome (0: death, 1: recovery).
def SEstatus():
return np.random.binomial(1,p=.5)
def recovery(medicine,status):
if status==0:
if medicine==0:
return np.random.binomial(1,.4)
else:
return np.random.binomial(1,.25)
else:
if medicine==0:
return np.random.binomial(1,.4)
else:
return np.random.binomial(1,.45)
In this implementation the single function for our socio-economic status replaces the greedy casino functions random_select_blinking() and random_select_drunkedness(), while the outcome function integrates the previous functions sneaky_select_reward() and play().
Notice how in our definition the effectiveness of medicine A is $0.4=\frac{0.4+0.4}{2}$ while the effectiveness of medicine B is $0.35=\frac{0.25+0.45}{2}$.
Suppose we perform an observational study. As the socio-economic status confounds the choice of the medicine and its outcome, we define the following function:
def observational_medicine(status):
if status==0:
return np.random.binomial(1,.1)
else:
return np.random.binomial(1,.9)
This function specifies that patients from the low socio-economic background are much more likely to select medicine A, while patients from the high socio-economic background are much more likely to select medicine B.
Let us now bring all our functions together and collect a set of observations.
def run_observational_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
status = SEstatus()
medicine = observational_medicine(status)
outcome = recovery(medicine,status)
results['x'][i] = medicine
results['z'][i] = status
results['y'][i] = outcome
return results
obs_res = run_observational_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 228369.57it/s]
Let us estimate the effectiveness of a medicine as the probability of leading to survival by computing $P(Y=1 \vert X=0)$ and $P(Y=1 \vert X=1)$.
n_recovery_medicineA = np.sum(obs_res['y'][obs_res['x']==0])
n_medicineA = np.sum(obs_res['x']==0)
print('Probability of recovery given medicine A: {0}'.format(n_recovery_medicineA / n_medicineA))
Probability of recovery given medicine A: 0.3997200559888022
n_recovery_medicineB = np.sum(obs_res['y'][obs_res['x']==1])
n_medicineB = np.sum(obs_res['x']==1)
print('Probability of recovery given medicine B: {0}'.format(n_recovery_medicineB / n_medicineB))
Probability of recovery given medicine B: 0.4282856571314263
It would seem that medicine B is more effective than A, when we have actually encoded that medicine A had a higher chance to lead to survival!
Let us now perform an interventional study, as taught in the medical literature. Each patient is now unable to choose which medicine she wants, but is instead forced to assume a random medicine. This randomization severs the link between socio-economic status and the choice of medicine, thus eliminating the confounding.
Practically, we will define an experimental scenario that can be described by the graph below:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Let us then define a function randomly assigning a medicine to a patient:
def interventional_medicine():
return np.random.binomial(1,.5)
And now we setup the loop to collect a set of interventions.
def run_interventional_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
status = SEstatus()
medicine = interventional_medicine()
outcome = recovery(medicine,status)
results['x'][i] = medicine
results['z'][i] = status
results['y'][i] = outcome
return results
int_res = run_interventional_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 234131.62it/s]
Let us estimate the effectiveness of a medicine as the probability of leading to survival by computing $P(Y=1 \vert do(X=0))$ and $P(Y=1 \vert do(X=1))$.
n_recovery_medicineA = np.sum(int_res['y'][int_res['x']==0])
n_medicineA = np.sum(int_res['x']==0)
print('Probability of recovery acting on medicine A: {0}'.format(n_recovery_medicineA / n_medicineA))
Probability of recovery acting on medicine A: 0.411611088790679
n_recovery_medicineB = np.sum(int_res['y'][int_res['x']==1])
n_medicineB = np.sum(int_res['x']==1)
print('Probability of recovery acting on medicine B: {0}'.format(n_recovery_medicineB / n_medicineB))
Probability of recovery acting on medicine B: 0.35663082437275984
We have now obtained a better estimation of the effectiveness of the medicines; in particular, we have that medicine A is more effective than medicine B, as we would expect.
Suppose we could not perform (for legal or ethical reasons) experiments. Given our observational causal model, we could still estimate the interventional quantity of interest using do-calculus. Knowing that the causal graph is:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','X'), ('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Then do-calculus teaches us that we can estimate: $$ P(Y=1 \vert do(X=x)) = \sum_Z P(Y=1 \vert X,Z)P(Z). $$ Let us verify this on our simulation. We decompose the formula as: $$ P(Y=1 \vert do(X=x)) = \sum_Z \frac{P(Y=1,X,Z)}{P(X,Z)}P(Z). $$
P_z0 = np.sum(obs_res['z']==0) / len(obs_res['z'])
P_z1 = np.sum(obs_res['z']==1) / len(obs_res['z'])
P_y1x0z0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0,obs_res['z']==0)))
P_y1x0z1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0,obs_res['z']==1)))
P_y1x1z0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1,obs_res['z']==0)))
P_y1x1z1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1,obs_res['z']==1)))
P_x0z0 = np.sum(np.logical_and(obs_res['x']==0,obs_res['z']==0))
P_x0z1 = np.sum(np.logical_and(obs_res['x']==0,obs_res['z']==1))
P_x1z0 = np.sum(np.logical_and(obs_res['x']==1,obs_res['z']==0))
P_x1z1 = np.sum(np.logical_and(obs_res['x']==1,obs_res['z']==1))
P_y1_givenx0z0 = P_y1x0z0 / P_x0z0
P_y1_givenx0z1 = P_y1x0z1 / P_x0z1
P_y1_givenx1z0 = P_y1x1z0 / P_x1z0
P_y1_givenx1z1 = P_y1x1z1 / P_x1z1
P_y1_dox0 = P_z0 * P_y1_givenx0z0 + P_z1 * P_y1_givenx0z1
P_y1_dox1 = P_z0 * P_y1_givenx1z0 + P_z1 * P_y1_givenx1z1
print('Probability of recovery acting on medicine A: {0}'.format(P_y1_dox0))
print('Probability of recovery acting on medicine B: {0}'.format(P_y1_dox1))
Probability of recovery acting on medicine A: 0.4160700690220282 Probability of recovery acting on medicine B: 0.3477268345864935
We indeed obtain a result very close to the actual experiments without performing any invasive intervention, but using only observational data together with knowledge of the underlying causal model.
Let us try again to explain the discrepancy between observational and interventional results, now in the context of the scenario of evaluating a new medicine.
What is the meaning of the observational results? In the observational scenario we observed $P(Y \vert X=x)$, that is, the probability of a patient to recover by letting her choose the medicine.
Now it happens that patients with a low socio-economic background tendentially choose medicine A (maybe because it is cheaper), while patients with a high socio-economic background normally opt for a medicine B (maybe because of advertisement). It also happens that the recovery rate of patients with a high socio-economic background is in general higher than that of patients with a low socio-economic background (maybe because of a richer diet). This higher base rate, combined with a predilection for medicine B, makes medicine B look more effective.
What is the meaning of the interventional results? In the interventional scenario we observed $P(Y \vert do(X=x))$, that is, the probability of a patient to recover by forcing her to assume a randomly-selected medicine.
Through randomization we break the association between low socio-economic background and medicine A, and high socio-economic background and medicine B. This allows us to probe the mechanism of recovery without the bias introduced by the confounder. And in this way we recover the actual effectiveness of the two treatments, leading us to conclude that, independently from external factors, medicine A is on average more effective than medicine B.
We have also shown that, if randomization is not possible, we could be able to reconstruct interventional results from observational data given the underlying causal model and relying on do-calculus.
It is worth pointing out that, in the scenarios above, while we are always struggling with confounders in an identical model, we are trying to answer different questions:
In the casino case, the state officials are probably concerned not only with the absolute causal effect of the lever. State officials should probably be concerned with how the discrepancy with the observational distribution is generated and whether something unlawful is in place to induce it (thus determining if the casino is greedy or ungreedy).
In the medical case, the doctor wants to be certain of the absolute causal effect of the medicine. Independently from whether the medicine will be prescribed, whether the patient will choose it, or whether the state will approve it, we want to know if the medicine is effective against a certain disease, and thus more likely to cause a recovery.
Let us develop further our example about the development of a new medicine.
Let us consider the same scenario we have worked on before. Having shown that medicine A is more effective than medicine B, we may wonder whether this could justify policymaking in favor of a legal adoption of medicine A. To evaluate this point let us run further simulations.
Let us assume that two different teams conducted the observational and the interventional study we performed above. Let us further imagine that they go on enforcing a medical protocol that always provide the medicine with highest effectiveness.
Let us start with the observational team. Observing that medicine B seems more effective, they will make every patient use medicine B. Let us compute the recovery rate $P(Y=1\vert \pi_B)$, where $\pi_B$ is the policy that always enforces $do(X=1)$.
def interventional_medicine():
return 1
int_res = run_interventional_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 321989.84it/s]
print('Probability of recovery for policy A: {0}'.format(np.sum(int_res['y']==1) / len(int_res['y'])))
Probability of recovery for policy A: 0.3581
Let us now move to the interventional team. Based on their estimation, they will instead make every patient assume medicine A. Let us compute the recovery rate that in their case, that is, $P(Y=1\vert \pi_A)$, where $\pi_A$ is the policy that always enforces $do(X=0)$.
def interventional_medicine():
return 0
int_res = run_interventional_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 324981.13it/s]
print('Probability of recovery for policy B: {0}'.format(np.sum(int_res['y']==1) / len(int_res['y'])))
Probability of recovery for policy B: 0.401
The interventional team achieved, as expected, a higher recovery rate.
Is that the best solution? It actually happens that if we let things go as they were naturally going without interventions we would have a higher recovery rate; let us compute $P(Y=1\vert \pi_\emptyset)$, where $\pi_\emptyset$ is a null-policy that reduces to observation $P(Y=1)$.
obs_res = run_observational_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 229693.98it/s]
print('Probability of recovery for no policy: {0}'.format(np.sum(obs_res['y']==1) / len(obs_res['y'])))
Probability of recovery for no policy: 0.4111
The better result is due to the fact that the patients left to their own devices select a medicine that happens to work better given their own state. However, it is important to explain happens: confounders led the patients to make a better choice than any interventional protocol. This, of course, is not always the case: in other scenarios it may well happen that patients left to their own devices make sub-optimal choices, and that an interventional policy would produce better results.
The result of the above simulations may not be intuitive, so let's try to explain them further.
What is the meaning of the studies we carried out related to policymaking? Our studies allowed us to evaluate out system under different conditions:
Should we intervene or not? Performing an intervenional study allowed us to find which medicine is, on average, more effective. This finding however may not immediately justify the adoption of a medicine instead of another (medicine A over medicine B).
In the real-world, when we deal with actual confounders, we care about how the outcome ($Y$) is affected not only by the causal power of the new medicine ($X$), but also by the causal power derived from interactions between the medicine itself and the confounders ($Z$).
Thus an interventional study can not guarantee that, once we take into account confounding factors, a different interventional policy or a purely observational policy would not produce a higher recovery rate.
Ideally the best policy would be an individualized policy that does not rely simply on the average treatment effect (ATE) of the medicine. More refined policies taking into account external factors may achieve better results than a blanket interventional policy ($\pi_A, \pi_B$) or a blind observational no-intervention policy ($\pi_\emptyset$). One first option is relying on the conditional average treatment effect (CATE), computed by considering features correlated to an unobserved confounder. This could allow us to tailor more effective policies.
Let us consider the medicine scenario again, where outcomes are confounded by the unobserved socio-economic status ($Z$). Let us now assume that we can observe a variable income ($W$) which is observable and highly correlated with socio-economic status.
We implement our new observable in the following function:
def income(status):
if status==0:
return np.random.binomial(1,.1)
else:
return np.random.binomial(1,.9)
Simply put, income() provides us with the value of socio-economic status with a probability of $0.9$. The model underlying our new interventional model has the following form:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z', 'W']
edges = [('Z','Y'), ('X','Y'), ('Z','W')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1), 'W':(0, 1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y', 'W':'W'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Let us now run interventional simulations and estimate some CATE quantities.
def run_CATE_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials),
'w': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
status = SEstatus()
proxy = income(status)
medicine = interventional_medicine()
outcome = recovery(medicine,status)
results['x'][i] = medicine
results['z'][i] = status
results['y'][i] = outcome
results['w'][i] = proxy
return results
We start by considering the intervention $do(X=0)$.
def interventional_medicine():
return 0
We collect our samples.
cate_res = run_CATE_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 218756.40it/s]
As before we can simply estimate the probability of recovery $P(Y\vert do(X=0))$:
P_doX0 = np.sum(cate_res['y']==1) / len(cate_res['y'])
print('Probability of recovery acting on medicine A: {0}'.format(P_doX0))
Probability of recovery acting on medicine A: 0.4069
However, we are now interested in estimating how this probability varies conditioned on the income variable, which acts as a proxy for socio-economic status. We estimate first $P(Y\vert do(X=0),W=0)$:
P_doX0_givenW0 = np.sum(np.logical_and(cate_res['y']==1,cate_res['w']==0)) / np.sum(cate_res['w']==0)
print('Probability of recovery acting on medicine A given income 0: {0}'.format(P_doX0_givenW0))
Probability of recovery acting on medicine A given income 0: 0.41284403669724773
And then $P(Y\vert do(X=0),W=1)$:
P_doX0_givenW1 = np.sum(np.logical_and(cate_res['y']==1,cate_res['w']==1)) / np.sum(cate_res['w']==1)
print('Probability of recovery acting on medicine A given income 1: {0}'.format(P_doX0_givenW1))
Probability of recovery acting on medicine A given income 1: 0.4009225832330525
Notice that these values are close to the ground truth of $0.4$ that we encoded both for the high and low socio-economic status. Deviations are partly explained by our noisy proxy which agrees with the unobserved confounders only with probability $0.9$.
Let us then move to considering the intervention $do(X=1)$:
def interventional_medicine():
return 1
We collect samples for this interventional setting as well:
cate_res = run_CATE_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 223064.49it/s]
We first compute the generic recovery probability $P(Y\vert do(X=1))$:
P_doX1 = np.sum(cate_res['y']==1) / len(cate_res['y'])
print('Probability of recovery acting on medicine B: {0}'.format(P_doX1))
Probability of recovery acting on medicine B: 0.3507
Then we evaluate conditional quantities $P(Y\vert do(X=1),W=0)$:
P_doX1_givenW0 = np.sum(np.logical_and(cate_res['y']==1,cate_res['w']==0)) / np.sum(cate_res['w']==0)
print('Probability of recovery acting on medicine B given income 0: {0}'.format(P_doX1_givenW0))
Probability of recovery acting on medicine B given income 0: 0.27400793650793653
And $P(Y\vert do(X=1),W=1)$:
P_doX1_givenW1 = np.sum(np.logical_and(cate_res['y']==1,cate_res['w']==1)) / np.sum(cate_res['w']==1)
print('Probability of recovery acting on medicine B given income 1: {0}'.format(P_doX1_givenW1))
Probability of recovery acting on medicine B given income 1: 0.42862903225806454
Again these values come close to the hard-coded values in our simulation.
Without information about the conditioning variable, all we can do is computing the average treatment effect (ATE), that is $P(Y\vert do(X=1)) - P(Y\vert do(X=0))$:
ate = P_doX1 - P_doX0
print(ate)
-0.05619999999999997
This is the difference (of about $5\%$) that led us to conclude in the previous interventional study that medicine A has a stronger causal effect on recovery. It is also the quantity that led us to suggest a blanket policy $\pi_A$ for the adoption of medicine A that turned out to be suboptimal.
Now, however, we can compute the conditional average treatment effect (CATE) taking advantage of our knowledge of income. This allow us to compute $P(Y\vert do(X=1),W=0) - P(Y\vert do(X=0),W=0)$:
cate0 = P_doX1_givenW0 - P_doX0_givenW0
print(cate0)
-0.1388361001893112
and $P(Y\vert do(X=1),W=1) - P(Y\vert do(X=0),W=1)$:
cate1 = P_doX1_givenW1 - P_doX0_givenW1
print(cate1)
0.027706449025012014
This reveals that for a low income (strongly correlated with a low socio-economic status) medicine A is more effective, while for a high income (strongly correlated with a high socio-economic status) medicine B is actually slightly more effective.
This knowledge allows us to define a better policy $\pi_{income}$ in which we exploit knowledge of income (and, indirectly, of the true confounder) to select which medicine to offer to a patient.
def CATE_medicine(income):
if income==0:
return 0
else:
return 1
Let us now set a simulation using this policy $\pi_{income}$:
def run_CATE_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials),
'w': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
status = SEstatus()
proxy = income(status)
medicine = CATE_medicine(proxy)
outcome = recovery(medicine,status)
results['x'][i] = medicine
results['z'][i] = status
results['y'][i] = outcome
results['w'][i] = proxy
return results
We collect the samples:
cate_res = run_CATE_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222722.18it/s]
And evaluate our recovery probability $P(Y\vert \pi_{income})$:
print('Probability of recovery for policy CATE: {0}'.format(np.sum(cate_res['y']==1) / len(cate_res['y'])))
Probability of recovery for policy CATE: 0.4095
The recovery rate is now higher than the trivial blanket interventional policies ($\pi_A,\pi_B$) or the observational policy ($\pi_\emptyset$).
Of course, this result depends on how well our proxy variable (income) follows the underlying and unobserved confounder (socio-economic status). In CATE sensitivity example in the Appendix it is shown that the quality of the estimation of the CATE effect depends on such a relationship. Often proxies may have complicated dynamics, or simply there may be no way of observing them.
An even finer-grained analysis may be offered by a counterfactual evaluation in which we can assess the individual treatment effect (ITE) for each subject. Estimating an ITE requires evaluating counterfactual quantities, and specifically consider the probability of recovery under the actual choice $P(Y=1 \vert X=x)$ and the probability of recovery in the counterfactual scenario in which a patient would have been given a different medicine $P_{X=x}(Y=1 \vert X=\bar{x})$. (This quantity is also used in [Bareinboim2015] to deal with the greedy casino.)
To work with counterfactual, we work with our basic causal graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','X'), ('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Estimating counterfactual quantities is usually practically impossibile. Here, we take advantage of our computation approach, and we set up a counterfactual simulation.
The idea is that, given an individual (or a patient or a unit), we want to compute the effectiveness of the medicine for this particular individual by using counterfactual quantities. Our estimation is not above a population (ATE), but for a single individual (ITE).
def run_CF_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials),
'cfy': np.zeros(n_trials),}
status = SEstatus()
for i in tqdm(range(n_trials)):
medicine = interventional_medicine()
outcome = recovery(medicine,status)
cf_medicine = np.logical_not(medicine)
cf_outcome = recovery(cf_medicine,status)
results['x'][i] = medicine
results['z'][i] = status
results['y'][i] = outcome
results['cfy'][i] = cf_outcome
return results
Notice how this simulation is different from the previous ones:
Let us run the simulation. For simplicity, let us assume the factual intervention to be always $do(X=0)$:
def interventional_medicine():
return 0
cf_res = run_CF_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 230356.27it/s]
Notice we run a single simulation, corresponding to a single individual. Now we can compute the individual treatment effect as a difference between
We can now compute the ITE for the individual as $$P_{X=0}(Y=1 \vert do(X=1)) - P(Y=1 \vert do(X=0))$$ where the first quantity is the counterfactual result of providing medicine B, while the second quantity is the result of providing the first medicine.
np.sum((cf_res['cfy']-cf_res['y'])) / len(cf_res['y'])
0.062
In this case, we found out that for the specific individual medicine B is more effective. However, if we consider a few more individuals, we will stumble into subjects for whom medicine A is more effective:
for i in range(10):
cf_res = run_CF_simulation()
print('ITE for individual {0}: {1}'.format(i,np.sum((cf_res['cfy']-cf_res['y'])) / len(cf_res['y'])))
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 232714.36it/s]
ITE for individual 0: 0.051
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 234515.18it/s]
ITE for individual 1: 0.0429
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 234846.08it/s]
ITE for individual 2: 0.0442
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 237710.35it/s]
ITE for individual 3: 0.049
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 235144.95it/s]
ITE for individual 4: 0.0565
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 235787.39it/s]
ITE for individual 5: -0.1395
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 233742.79it/s]
ITE for individual 6: 0.0555
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 231149.71it/s]
ITE for individual 7: -0.15
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 232949.59it/s]
ITE for individual 8: -0.1474
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 236809.79it/s]
ITE for individual 9: 0.0462
Counterfactual analysis allows us to consider different quantities (effect of the treatment of treated, probability of necessity and sufficiency) which provide us with deeper insights into our system.
Our simulations showed that in order to define a good policy we may want to carry out several analysis of our system in order to figure out the effect and the impact of our actions and the confounders.
What is the meaning of our studies in relation to policymaking? Different studies allowed us to better probe the contribution of an unobserved confounder:
Let's consider again our original observational and interventional studies that we performed to evaluate the effectiveness of a new medicine. The discrepancy that we have noticed between the observational result (medicine B is more effective than medicine A) and interventional result (medicine A is more effective than medicine B) may be seen as a case of Simpson's paradox.
Simpson's paradox is a classical example in which confounder may lead to apparently paradoxical results. This paradox may be explained away in terms of modelling. In the previous discussion we have qualified our results always in relation to an assumed graphical causal model underlying our data. A wrong choice in this assumption may sometimes explain apparent paradoxes.
For example, in the observational study, when we believed we were directly assessing the causal effect of a medicine on recovery $P(Y\vert X)$, we were implicitly and mistakenly assuming the following causal model:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y',]
edges = [('X','Y')]
nodes_pos = {'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
When we performed our interventional study, we correctly identified the underlying model as:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','X'), ('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
and we correctly estimated $P(Y\vert do(X))$ by removing the confounder through an intervention or via do-calculus, thus ending up working on:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
The discrepancy in the effectiveness of the medicine may thus be ascribed to a misspecified model: the observational causal estimation is correct with respect to the no-confounder model, but wrong with respect to the causal model with confounder; the interventional estimation is correct with respect to the causal model with confounder.
Is our interventional result the final word on the effect of the medicine on the survival? Of course not: it depends on the model we consider.
Let us consider a more elaborate model in which we have two confounders: a socio-economic status ($Z_{se}$) and a genetic factor ($Z_{g}$). The true underlying graph of our new model is:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Zse', 'Zg']
edges = [('Zse','X'), ('Zse','Y'), ('X','Y'),('Zg','X'), ('Zg','Y')]
nodes_pos = {'Zse':(-.5, .5),'Zg':(-.5, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Zse':'$Z_{se}$','Zg':'$Z_{g}$', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Let us construct the computational model for this new scenario.
def GStatus():
return np.random.binomial(1,p=.5)
def SEStatus():
return np.random.binomial(1,p=.5)
def recovery(medicine,SEstatus,Gstatus):
if Gstatus==0:
if SEstatus==0:
if medicine==0:
return np.random.binomial(1,.2)
else:
return np.random.binomial(1,.05)
else:
if medicine==0:
return np.random.binomial(1,.4)
else:
return np.random.binomial(1,.45)
else:
if SEstatus==0:
if medicine==0:
return np.random.binomial(1,.6)
else:
return np.random.binomial(1,.45)
else:
if medicine==0:
return np.random.binomial(1,.4)
else:
return np.random.binomial(1,.45)
Notice how the recovery rates are now different, as they are affected by two confounders.
We now run a standard observational study.
def observational_medicine(SEstatus,Gstatus):
if Gstatus==0:
if SEstatus==0:
return np.random.binomial(1,.1)
else:
return np.random.binomial(1,.95)
else:
if SEstatus==0:
return np.random.binomial(1,.1)
else:
return np.random.binomial(1,.85)
def run_observational_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z_se': np.zeros(n_trials),
'z_g': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
SEstatus = SEStatus()
Gstatus = GStatus()
medicine = observational_medicine(SEstatus,Gstatus)
outcome = recovery(medicine,SEstatus,Gstatus)
results['x'][i] = medicine
results['z_se'][i] = SEstatus
results['z_g'][i] = Gstatus
results['y'][i] = outcome
return results
obs_res = run_observational_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 173415.09it/s]
We can then estimate the probabilities $P(Y=1 \vert X=0)$ and $P(Y=1 \vert X=1)$.
n_recovery_medicineA = np.sum(obs_res['y'][obs_res['x']==0])
n_medicineA = np.sum(obs_res['x']==0)
print('Probability of recovery given medicine A: {0}'.format(n_recovery_medicineA / n_medicineA))
Probability of recovery given medicine A: 0.39102173471460494
n_recovery_medicineB = np.sum(obs_res['y'][obs_res['x']==1])
n_medicineB = np.sum(obs_res['x']==1)
print('Probability of recovery given medicine B: {0}'.format(n_recovery_medicineB / n_medicineB))
Probability of recovery given medicine B: 0.42485719913334646
Apparently medicine B is more effective than A. Giving a causal meaning to this result would be equivalent to assuming that we are working with the following model:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y',]
edges = [('X','Y')]
nodes_pos = {'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
However, we know this model is wrong, so we move on to an interventional study.
Let us define an intervention protocol.
def interventional_medicine():
return np.random.binomial(1,.5)
def run_interventional_simulation(n_trials=n_trials):
results = {'x': np.zeros(n_trials),
'z_se': np.zeros(n_trials),
'z_g': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
SEstatus = SEStatus()
Gstatus = GStatus()
medicine = interventional_medicine()
outcome = recovery(medicine,SEstatus,Gstatus)
results['x'][i] = medicine
results['z_se'][i] = SEstatus
results['z_g'][i] = Gstatus
results['y'][i] = outcome
return results
int_res = run_interventional_simulation()
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 170878.74it/s]
This allows us to compute $P(Y=1 \vert do(X=0))$ and $P(Y=1 \vert do(X=1))$.
n_recovery_medicineA = np.sum(int_res['y'][int_res['x']==0])
n_medicineA = np.sum(int_res['x']==0)
print('Probability of recovery acting on medicine A: {0}'.format(n_recovery_medicineA / n_medicineA))
Probability of recovery acting on medicine A: 0.39489154672613014
n_recovery_medicineB = np.sum(int_res['y'][int_res['x']==1])
n_medicineB = np.sum(int_res['x']==1)
print('Probability of recovery acting on medicine B: {0}'.format(n_recovery_medicineB / n_medicineB))
Probability of recovery acting on medicine B: 0.3447799486875863
These results correctly estimate that medicine A is more effective than medicine B, as we would expect. Indeed, performing an intervention leads us to work with the following post-interventional model:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Zse', 'Zg']
edges = [('Zse','Y'), ('X','Y'), ('Zg','Y')]
nodes_pos = {'Zse':(-.5, .5),'Zg':(-.5, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Zse':'$Z_{se}$','Zg':'$Z_{g}$', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
which correctly cuts out the influence of all the confounders and let us estimate the correct causal effect of $X$ on $Y$.
Let us suppose to be in a situation in which interventions are not possible. We may want to try to estimate causal quantities from the observational variables.
We may be first induced to work with the model considering only the socio-economic status because it has already proven quite successful. That is, we may assume that only socio-economic factors are relevant confounders. This is equivalent to working with the following graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Zse', 'Zg']
edges = [('Zse','X'), ('Zse','Y'), ('X','Y')]
nodes_pos = {'Zse':(-.5, .5),'Zg':(-.5, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Zse':'$Z_{se}$','Zg':'$Z_{g}$', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
In such a graph, do-calculus instructs to estimate: $$ P(Y=1 \vert do(X=x)) = \sum_{Z_{se}} P(Y=1 \vert X,{Z_{se}})P({Z_{se}}). $$
P_z0 = np.sum(obs_res['z_se']==0) / len(obs_res['z_se'])
P_z1 = np.sum(obs_res['z_se']==1) / len(obs_res['z_se'])
P_y1x0z0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0,obs_res['z_se']==0)))
P_y1x0z1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0,obs_res['z_se']==1)))
P_y1x1z0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1,obs_res['z_se']==0)))
P_y1x1z1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1,obs_res['z_se']==1)))
P_x0z0 = np.sum(np.logical_and(obs_res['x']==0,obs_res['z_se']==0))
P_x0z1 = np.sum(np.logical_and(obs_res['x']==0,obs_res['z_se']==1))
P_x1z0 = np.sum(np.logical_and(obs_res['x']==1,obs_res['z_se']==0))
P_x1z1 = np.sum(np.logical_and(obs_res['x']==1,obs_res['z_se']==1))
P_y1_givenx0z0 = P_y1x0z0 / P_x0z0
P_y1_givenx0z1 = P_y1x0z1 / P_x0z1
P_y1_givenx1z0 = P_y1x1z0 / P_x1z0
P_y1_givenx1z1 = P_y1x1z1 / P_x1z1
P_y1_dox0 = P_z0 * P_y1_givenx0z0 + P_z1 * P_y1_givenx0z1
P_y1_dox1 = P_z0 * P_y1_givenx1z0 + P_z1 * P_y1_givenx1z1
print('Probability of recovery acting on medicine A: {0}'.format(P_y1_dox0))
print('Probability of recovery acting on medicine B: {0}'.format(P_y1_dox1))
Probability of recovery acting on medicine A: 0.39487038390851537 Probability of recovery acting on medicine B: 0.3629170674423526
The result shows that medicine A is more effective than medicine B, but the estimand is less close to the target than before.
Someone may then suggest to us that the only confounder we need to account for are actually genetic factors. Socio-economic factors are not causally relevant, and we should not control for them. We may heed to this advice, and come up with the alternative causal graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Zse', 'Zg']
edges = [('X','Y'),('Zg','X'), ('Zg','Y')]
nodes_pos = {'Zse':(-.5, .5),'Zg':(-.5, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Zse':'$Z_{se}$','Zg':'$Z_{g}$', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
In such a graph, do-calculus tells us that we can estimate: $$ P(Y=1 \vert do(X=x)) = \sum_{Z_{g}} P(Y=1 \vert X,{Z_{g}})P({Z_{g}}). $$
P_z0 = np.sum(obs_res['z_g']==0) / len(obs_res['z_g'])
P_z1 = np.sum(obs_res['z_g']==1) / len(obs_res['z_g'])
P_y1x0z0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0,obs_res['z_g']==0)))
P_y1x0z1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0,obs_res['z_g']==1)))
P_y1x1z0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1,obs_res['z_g']==0)))
P_y1x1z1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1,obs_res['z_g']==1)))
P_x0z0 = np.sum(np.logical_and(obs_res['x']==0,obs_res['z_g']==0))
P_x0z1 = np.sum(np.logical_and(obs_res['x']==0,obs_res['z_g']==1))
P_x1z0 = np.sum(np.logical_and(obs_res['x']==1,obs_res['z_g']==0))
P_x1z1 = np.sum(np.logical_and(obs_res['x']==1,obs_res['z_g']==1))
P_y1_givenx0z0 = P_y1x0z0 / P_x0z0
P_y1_givenx0z1 = P_y1x0z1 / P_x0z1
P_y1_givenx1z0 = P_y1x1z0 / P_x1z0
P_y1_givenx1z1 = P_y1x1z1 / P_x1z1
P_y1_dox0 = P_z0 * P_y1_givenx0z0 + P_z1 * P_y1_givenx0z1
P_y1_dox1 = P_z0 * P_y1_givenx1z0 + P_z1 * P_y1_givenx1z1
print('Probability of recovery acting on medicine A: {0}'.format(P_y1_dox0))
print('Probability of recovery acting on medicine B: {0}'.format(P_y1_dox1))
Probability of recovery acting on medicine A: 0.38348681762333375 Probability of recovery acting on medicine B: 0.425662020392382
Annoyingly, we observe a reversal of effectiveness: controlling for genetic factors makes medicine B looks more effective than medicine A.
After further study, we come to develop the right causal graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Zse', 'Zg']
edges = [('Zse','X'), ('Zse','Y'), ('X','Y'),('Zg','X'), ('Zg','Y')]
nodes_pos = {'Zse':(-.5, .5),'Zg':(-.5, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Zse':'$Z_{se}$','Zg':'$Z_{g}$', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Now do-calculus states that: $$ P(Y=1 \vert do(X=x)) = \sum_{Z_{se},Z_{g}} P(Y=1 \vert X,Z_{se},{Z_{g}})P(Z_{se},{Z_{g}}). $$
P_s0g0 = np.sum(np.logical_and(obs_res['z_se']==0,obs_res['z_g']==0)) / len(obs_res['z_se'])
P_s0g1 = np.sum(np.logical_and(obs_res['z_se']==0,obs_res['z_g']==1)) / len(obs_res['z_se'])
P_s1g0 = np.sum(np.logical_and(obs_res['z_se']==1,obs_res['z_g']==0)) / len(obs_res['z_se'])
P_s1g1 = np.sum(np.logical_and(obs_res['z_se']==1,obs_res['z_g']==1)) / len(obs_res['z_se'])
P_y1x0s0g0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==0))))
P_y1x0s0g1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==1))))
P_y1x0s1g0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==0))))
P_y1x0s1g1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==1))))
P_y1x1s0g0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==0))))
P_y1x1s0g1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==1))))
P_y1x1s1g0 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==0))))
P_y1x1s1g1 = np.sum(np.logical_and(obs_res['y']==1, np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==1))))
P_x0s0g0 = np.sum(np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==0)))
P_x0s0g1 = np.sum(np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==1)))
P_x0s1g0 = np.sum(np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==0)))
P_x0s1g1 = np.sum(np.logical_and(obs_res['x']==0, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==1)))
P_x1s0g0 = np.sum(np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==0)))
P_x1s0g1 = np.sum(np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==0,obs_res['z_g']==1)))
P_x1s1g0 = np.sum(np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==0)))
P_x1s1g1 = np.sum(np.logical_and(obs_res['x']==1, np.logical_and(obs_res['z_se']==1,obs_res['z_g']==1)))
P_y1_givenx0s0g0 = P_y1x0s0g0 / P_x0s0g0
P_y1_givenx0s0g1 = P_y1x0s0g1 / P_x0s0g1
P_y1_givenx0s1g0 = P_y1x0s1g0 / P_x0s1g0
P_y1_givenx0s1g1 = P_y1x0s1g1 / P_x0s1g1
P_y1_givenx1s0g0 = P_y1x1s0g0 / P_x1s0g0
P_y1_givenx1s0g1 = P_y1x1s0g1 / P_x1s0g1
P_y1_givenx1s1g0 = P_y1x1s1g0 / P_x1s1g0
P_y1_givenx1s1g1 = P_y1x1s1g1 / P_x1s1g1
P_y1_dox0 = P_s0g0 * P_y1_givenx0s0g0 + P_s0g1 * P_y1_givenx0s0g1 + P_s1g0 * P_y1_givenx0s1g0 + P_s1g1 * P_y1_givenx0s1g1
P_y1_dox1 = P_s0g0 * P_y1_givenx1s0g0 + P_s0g1 * P_y1_givenx1s0g1 + P_s1g0 * P_y1_givenx1s1g0 + P_s1g1 * P_y1_givenx1s1g1
print('Probability of recovery acting on medicine A: {0}'.format(P_y1_dox0))
print('Probability of recovery acting on medicine B: {0}'.format(P_y1_dox1))
Probability of recovery acting on medicine A: 0.392780826085904 Probability of recovery acting on medicine B: 0.3563854093116481
And indeed these results are closer to the interventional study and our implementation.
Reading our scenario as a case of Simpson's paradox allowed us to highlight two points:
The discrepancy between observational and interventional results may be explained as a mismatch between a causal reading and the underlying causal graph. Giving observational data, a causal interpretation means assuming a causal graph with no confounders; interpreting the interventional data as causal means working with a graph having potential confounders.
Discrepancy is possible also between interventional quantities estimated via do-calculus when we assume different underlying causal graphs. The causal graphs tell us how to control for confounders, and if the graph does not agree with the system at hand, we may estimate inconsistent causal effects.
Students apply for university admission with one of the six departments available ($X$). Their choice is affected by their gender ($Z$); the outcome ($Y$) of their application depends both on the chosen department and the gender of the candidate, representing the fact that a gender may be more or less inclined towards a specific discipline.
This problem takes the form of a 6-armed bandit with unobserved confounder, since we will consider 6 possible departments for a student to apply to.
We are then dealing with our usual graph:
G = nx.DiGraph()
G.graph['dpi'] = 120
nodes = ['X', 'Y', 'Z']
edges = [('Z','X'), ('Z','Y'), ('X','Y')]
nodes_pos = {'Z':(-.7, 1), 'X':(-1, -1), 'Y':(0, -1)}
nodes_lbl = {'Z':'Z', 'X':'X', 'Y':'Y'}
G.add_nodes_from(nodes)
G.add_edges_from(edges)
nx.draw(G,nodes_pos,node_size=800,node_color='white')
_ = nx.draw_networkx_labels(G,nodes_pos,nodes_lbl)
Notice that, alternatively, we may imagine gender as an observable proxy for a deeper confounder related to those social factors that may make an individual more or less apt at a given discipline.
Let us simulate an instance of Simpson's paradox using the same setup we used for the greedy casino.
def random_select_gender():
return np.random.binomial(1,p=.5)
def select_admissionrate(gender):
if gender==0:
return np.array([.62,.63,.37,.33,.28,.06])
else:
return np.array([.82,.68,.34,.35,.24,.07])
def apply(department,admissionrate):
return np.random.binomial(1,admissionrate[department])
def observational_select_department(gender):
if gender==0:
return np.random.choice(range(6),p=[.31,.21,.12,.15,.07,.14])
else:
return np.random.choice(range(6),p=[.06,.02,.3,.21,.22,.19])
The function for selecting the gender of the applicant is analogous to functions random_select_blinking() or random_select_drunkedness(). The selection of admission rate corresponds to the sneaky_select_reward() function (although the current function is not sneaky in itself). The apply function, which returns the outcome of the application (0: rejection, 1: acceptance), takes the place of the play() function before. Finally we have an observational simulation function; this assumes that the students have a natural distribution of choice of departments, similarly to what happened in the observational_select_lever() function.
Let us bring all together to instantiate an observational simulation.
def run_observational_simulation(n_trials=n_trials):
results = {'gender': np.zeros(n_trials),
'department': np.zeros(n_trials),
'outcome': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
gender = random_select_gender()
admissionrate = select_admissionrate(gender)
department = observational_select_department(gender)
outcome = apply(department,admissionrate)
results['gender'][i] = gender
results['department'][i] = department
results['outcome'][i] = outcome
return results
Let us collect a set of samples.
obs_res = run_observational_simulation()
100%|██████████████████████████████████| 10000/10000 [00:00<00:00, 72033.17it/s]
Let us look then at the probability of admission $P(Y=1)$, as well as the probability of admission conditioned on gender, $P(Y=1 \vert Z=male)$ and $P(Y=1 \vert Z=female)$.
print('Probability of admission: {0}'.format(np.sum(obs_res['outcome']==1) / len(obs_res['outcome'])))
Probability of admission: 0.3813
print('Probability of male admission: {0}'.format(np.sum(np.logical_and(obs_res['outcome']==1,obs_res['gender']==0)) / np.sum(obs_res['gender']==0)))
Probability of male admission: 0.44402239104358254
print('Probability of female admission: {0}'.format(np.sum(np.logical_and(obs_res['outcome']==1,obs_res['gender']==1)) / np.sum(obs_res['gender']==1)))
Probability of female admission: 0.31852741096438575
Probability of male admission is greater than female admission.
Let us now look at the breakdown by department, that is, $P(Y=1 \vert Z=male, X=dept_i)$ and $P(Y=1 \vert Z=female, X=dept_i)$
obs_male_admission_per_dept = []
for i in range(6):
n_success_dept = np.sum(obs_res['outcome'][np.logical_and(obs_res['department']==i,obs_res['gender']==0)])
n_dept = np.sum(np.logical_and(obs_res['department']==i,obs_res['gender']==0))
obs_male_admission_per_dept.append(n_success_dept / n_dept)
print('Probability of male admission given department {0}: {1}'.format(i, n_success_dept / n_dept))
Probability of male admission given department 0: 0.6064030131826742 Probability of male admission given department 1: 0.6447761194029851 Probability of male admission given department 2: 0.3547854785478548 Probability of male admission given department 3: 0.34805890227576974 Probability of male admission given department 4: 0.25210084033613445 Probability of male admission given department 5: 0.06051873198847262
obs_female_admission_per_dept = []
for i in range(6):
n_success_dept = np.sum(obs_res['outcome'][np.logical_and(obs_res['department']==i,obs_res['gender']==1)])
n_dept = np.sum(np.logical_and(obs_res['department']==i,obs_res['gender']==1))
obs_female_admission_per_dept.append(n_success_dept / n_dept)
print('Probability of female admission given department {0}: {1}'.format(i, n_success_dept / n_dept))
Probability of female admission given department 0: 0.8390092879256966 Probability of female admission given department 1: 0.6082474226804123 Probability of female admission given department 2: 0.3567251461988304 Probability of female admission given department 3: 0.34001910219675263 Probability of female admission given department 4: 0.25757575757575757 Probability of female admission given department 5: 0.09081196581196581
Let us plot these results for easier interpretation:
xaxis = np.arange(6)
xlabels = ['Dept 0','Dept 1','Dept 2','Dept 3','Dept 4','Dept 5']
plt.bar(xaxis - 0.2, obs_male_admission_per_dept, 0.4, label = 'Obs Male')
plt.bar(xaxis + 0.2, obs_female_admission_per_dept, 0.4, label = 'Obs Female')
plt.xticks(xaxis, xlabels)
plt.ylabel("Admission")
plt.legend()
plt.show()
In almost all departments, female applicants have a higher acceptance rate.
How do we deal with this apparent inconsistency? It easy to explain it with respect to the underlying causal model:
In our first analysis we just considered $P(Y \vert Z)$: this allowed us to study the causal relationship between gender and acceptance, accounting for both directed and mediated effects; however, this does not describe explicitly how the choice of the department affects the result.
In the second analysis we studied $P(Y \vert Z,X)$: this allowed us to perform a stratified study and see how the causal relationship between gender and acceptance varies at different levels of the department value.
Through all the scenarios above (casinos, medicines, university) we had the opportunity of seeing different aspects of confounding.
First of all, we have seen different types of confounders:
In between, we have dealt with unobservable confounders that are partially observable, at least from one party (like level of blinking in the Greedy casino), or which can be estimated via a proxy (like income in the Policymaking for a new medicine).
We have carried out different types of studies which provided us with different information:
Formally, we have analyzed different quantities:
Exploring all these aspects hopefully offered a wider view and perspective on confounders, their meaning and their role in causal inference.
[Bareinboim2015] Bareinboim, Elias, Andrew Forney, and Judea Pearl. "Bandits with unobserved confounders: A causal approach." Advances in Neural Information Processing Systems 28 (2015): 1342-1350.
[Pearl2009] Pearl, Judea. Causality. Cambridge university press, 2009.
Further examples are provided here in appendix.
This example shows that we can have a situation similar to the greedy casino illustrated above with a single confounding variable ($Z$).
def Z():
return np.random.binomial(1,p=.8)
def R(z):
if z==0:
return np.array([.1,.5])
else:
return np.array([.5,.1])
def X(z):
return z
def doX():
return np.random.binomial(1,p=.5)
def Y(x,r):
return np.random.binomial(1,r[int(x)])
def run_observational_simulation(n_trials=1000):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
z = Z()
r = R(z)
x = X(z)
y = Y(x,r)
results['x'][i] = x
results['z'][i] = z
results['y'][i] = y
return results
res = run_observational_simulation()
print('Probability of victory: {0}'.format(np.sum(res['y']==1) / len(res['y'])))
100%|███████████████████████████████████| 1000/1000 [00:00<00:00, 268246.61it/s]
Probability of victory: 0.102
def run_interventional_simulation(n_trials=1000):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
z = Z()
r = R(z)
x = doX()
y = Y(x,r)
results['x'][i] = x
results['z'][i] = z
results['y'][i] = y
return results
res = run_interventional_simulation()
print('Probability of victory: {0}'.format(np.sum(res['y']==1) / len(res['y'])))
100%|███████████████████████████████████| 1000/1000 [00:00<00:00, 195775.95it/s]
Probability of victory: 0.268
This example shows that a maximally-greedy casino may min-max the rewards to minimize the observational return and still have a legitimate-looking interventional return.
def Z1():
return np.random.binomial(1,p=.5)
def Z2():
return np.random.binomial(1,p=.5)
def R(z1,z2):
if z2==0:
if z1==0:
return np.array([.01,.59])
else:
return np.array([.59,.01])
else:
if z1==0:
return np.array([.59,.01])
else:
return np.array([.01,.59])
def X(z1,z2):
return np.logical_xor(z1,z2)
def doX():
return np.random.binomial(1,p=.5)
def Y(x,r):
return np.random.binomial(1,r[int(x)])
def run_observational_simulation(n_trials=1000):
results = {'x': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
z1 = Z1(); z2 = Z2()
r = R(z1,z2)
x = X(z1,z2)
y = Y(x,r)
results['x'][i] = x
results['y'][i] = y
return results
res = run_observational_simulation()
print('Probability of victory: {0}'.format(np.sum(res['y']==1) / len(res['y'])))
100%|███████████████████████████████████| 1000/1000 [00:00<00:00, 169412.07it/s]
Probability of victory: 0.013
def run_interventional_simulation(n_trials=1000):
results = {'x': np.zeros(n_trials),
'y': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
z1 = Z1(); z2 = Z2()
r = R(z1,z2)
x = doX()
y = Y(x,r)
results['x'][i] = x
results['y'][i] = y
return results
res = run_interventional_simulation()
print('Probability of victory: {0}'.format(np.sum(res['y']==1) / len(res['y'])))
100%|███████████████████████████████████| 1000/1000 [00:00<00:00, 156009.08it/s]
Probability of victory: 0.289
Let us analyze how the sensitivity of CATE depends on how reliably our proxy represent the unobserved confounder. We will consider a parameter $p$ denoting the probability that the proxy (income) and the unobserved confounder (socio-economic status) agree. We consider a range of values for $p$.
p = np.arange(0.1,1.1,0.1)
We define a parametric income() function that depends on the value of $p$.
def income(status,p):
if status==0:
return np.random.binomial(1,1-p)
else:
return np.random.binomial(1,p)
def recovery(medicine,status):
if status==0:
if medicine==0:
return np.random.binomial(1,.4)
else:
return np.random.binomial(1,.25)
else:
if medicine==0:
return np.random.binomial(1,.4)
else:
return np.random.binomial(1,.45)
Then we instantiate a versatile CATE simulation that receives in input the value of $p$ and the value of an intervention.
def run_CATE_simulation(n_trials=n_trials,p=.9,intervention=0):
results = {'x': np.zeros(n_trials),
'z': np.zeros(n_trials),
'y': np.zeros(n_trials),
'w': np.zeros(n_trials)}
for i in tqdm(range(n_trials)):
status = SEstatus()
proxy = income(status,p)
medicine = intervention
outcome = recovery(medicine,status)
results['x'][i] = medicine
results['z'][i] = status
results['y'][i] = outcome
results['w'][i] = proxy
return results
We run our simulation for all the values of $p$ and for the two possible interventions, estimating the value of the CATE.
ps = np.arange(0.1,1.1,0.1)
cates0 = []
cates1 = []
for p in ps:
int0_results = run_CATE_simulation(n_trials,p,0)
int1_results = run_CATE_simulation(n_trials,p,1)
P_doX0_givenW0 = np.sum(np.logical_and(int0_results['y']==1,int0_results['w']==0)) / np.sum(int0_results['w']==0)
P_doX0_givenW1 = np.sum(np.logical_and(int0_results['y']==1,int0_results['w']==1)) / np.sum(int0_results['w']==1)
P_doX1_givenW0 = np.sum(np.logical_and(int1_results['y']==1,int1_results['w']==0)) / np.sum(int1_results['w']==0)
P_doX1_givenW1 = np.sum(np.logical_and(int1_results['y']==1,int1_results['w']==1)) / np.sum(int1_results['w']==1)
cate0 = P_doX1_givenW0 - P_doX0_givenW0
cate1 = P_doX1_givenW1 - P_doX0_givenW1
cates0.append(cate0)
cates1.append(cate1)
100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 225835.33it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 225254.37it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 226581.96it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 220666.90it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 224759.48it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 223853.81it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 224150.49it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 223148.76it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 221828.12it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222896.16it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222458.75it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222331.40it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 224713.72it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222687.88it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 220998.27it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222523.66it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 221917.32it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222445.77it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 223792.89it/s] 100%|█████████████████████████████████| 10000/10000 [00:00<00:00, 222016.00it/s]
Finally, we plot our results, together with the true value of the causal effect at the two different levels of the socio-economic status ($Z$) variable.
true_P_givenZ0 = [.25 - .4]*len(ps)
true_P_givenZ1 = [.45 - .4]*len(ps)
plt.plot(ps,true_P_givenZ0,c='blue',linestyle='dashed',label='True causal effect for Z=0')
plt.plot(ps,true_P_givenZ1,c='red',linestyle='dashed',label='True causal effect for Z=1')
plt.plot(ps,cates0,c='blue',label='Estimated causal effect relying on W=0')
plt.plot(ps,cates1,c='red',label='Estimated causal effect relying on W=1')
plt.xlabel('p')
plt.legend()
<matplotlib.legend.Legend at 0x150f1ba00>
As expected, the more faithful the income ($W$) variable is to the socio-economic status ($Z$), that is the higher is $p$, the closer is our estimation to the true causal effect.