In the previous notebooks we have discussed at length abstraction and (interventional) consistency, especially in the framework of [Rischel2020]. We have defined the notion of abstraction and abastraction error (first notebook), examined properties of this definition (second notebook), compared abstractions to transformations from [Rubenstein2017] (third notebook), implemented automated code to compute the abstraction error (fourth notebook), and then reviewed the compositionality of abstraction error from [Rischel2021] (fifth notebook).
An underlying idea behind these explorations was that the quality of an abstraction could be assessed through a quantitative evaluation of interventional consistency, that is, the requirement that: (i) using a low-level mechanism under intervention and then abstracting; or (ii) abstracting to the intervened high-level model and then using a mechanism; would produce the same result.
In this notebook we take a closer look at interventional consistency, and we explore other forms of consistency. We instantiate a series of models and abstractions to show how observational, interventional and counterfactual consistency are not strictly related to one another.
The notebook aims at showing in a succint way the existence of models and abstractions that may guarantee different forms of consistency. We will go through the following steps:
DISCLAIMER 1: the notebook refers to ideas from causality and category theory for which only a quick definition is offered. Useful references for causality are [Pearl2009,Peters2017], while for category theory are [Spivak2014,Fong2018].
DISCLAIMER 2: mistakes are in all likelihood due to misunderstandings by the notebook author in reading [Rischel2020]. Feedback very welcome! :)
We start by importing basic libraries.
import numpy as np
import networkx as nx
import scipy
from tqdm.notebook import tqdm
from pgmpy.models import BayesianNetwork as BN
from pgmpy.factors.discrete import TabularCPD as cpd
from pgmpy.inference import VariableElimination
For reproducibility, and for discussing our results in this notebook, we set a random seed to $0$.
np.random.seed(0)
We set a verbose parameter to control the display of probability distributions.
verbose = False
We also set the number of samples in our empirical simulations.
n_samples = 10**5
In this notebook we write implementations of our models directly in pgmpy, but we do not rely on our own Abstraction objects from src.SCMMappings. For the sake of simplicity and illustration we will mainly work with abstractions that are trivial identities.
We are generically interested in working with forms of consistency. Following [Rischel2020] we will express consistency as a (category-theoretical) commuting diagram:
$$ \begin{array}{ccc} A& \overset{\mu}{\rightarrow} & B \\ \alpha_{X}{\downarrow}& &{\downarrow}\alpha_{Y} \\ X& \overset{\nu}{\rightarrow} & Y \end{array} $$where $A,B,X,Y$ are finite sets, and $\mu,\nu,\alpha_X,\alpha_Y$ are Markov kernels (column-stochastic matrices). In our abstraction setting $A,B,X,Y$ are outcomes, $\mu,\nu$ mechanisms, and $\alpha_X,\alpha_Y$ are abstractions with the added requirement of being binary matrices.
Consisency tells us that starting from an element $a \in A$ we arrive at the same result whether:
Meaning of consistency. Why are we so interested in consistency? Consistency tells us something about the alignment, or about the correspondance, of the mechanisms $\mu$ and $\nu$ with respect to the abstraction. It says something about the dynamics of the model, and how their inputs and outputs are related.
It is, however, a strongly syntactical property. It is concerned with the replaceability of one model with the other in the sense of maintaining an alignment between them, but it does not concern itself with the quality or information at different levels.
Taxonomy of consistencies. Now, we can have different forms of consistency with respect to two different parameters used to generate the diagrams above:
Originating graph: whether the graph from which we derive the diagram is the observational model $\mathcal{M}$, a post-interventional model $\mathcal{M}_\iota$, or a counterfactual model $\mathcal{M}_{\iota\bar{\iota}}$.
Distributions: relating to the choice of the sets $A,B,X,Z$ and consequently $\mu,\nu$.
A note on the formalism. Since $\mu$ and $\nu$ are matrices encoding distributions, in the following section we will use an explicit probability notation, instead of greek letters (normally used to denote mechanisms) or capital letters (sometimes used for matrices). We will then use symbols such as $p_\mathcal{M}(A)$ to refer to a matrix encoding this discrete distribution. Every probability distribution thus correspond to a stochastic matrix. Moreover, the subscript will always make clear from which model the distribution is derived.
A note on computing distributions in the models. In the following diagram we will refer to different distributions such as $p_{\mathcal{M'}}(X)$ or $p_{\mathcal{M}}(B\vert A)$. In the context of Markovian and semi-Markovian model, we assume that all these quantities are computable from the base SCMs $\mathcal{M}$ and ${\mathcal{M'}}$ and their joints.
As suggested above we want to consider different forms of consistencies, specifically observational, interventional and counterfactual.
It is well-known that these three types of quantities live on three layers of a strict hierarchy; each layer is furthermore associated with a mathematical object that allows for the treatment of these quantities. Thus, Bayes networks (BN) deal only with observational quantities; casual Bayes networks enable us to evaluate not only observational quantities, but also interventional quantities; and, finally, structural causal models (SCM) add the possibility of considering counterfactual quantities [Bareinboim2022].
Our discussion of causal models started from SCMs [Pearl2009] (see first notebook), but through the simplification of our models and their embedding in $\mathtt{FinStoch}$ following the approach of [Rischel2020] (see first notebook again) we dropped enough information that we have been left virtually with a BN or a CBN. If we were working with a CBN we would not be able to discuss counterfactual queries; with a BN we could not even assess an interventional query.
The approach that we have implicitly followed, and which we will make explicit now, is that we are always given a completely defined SCM $\mathcal{M}^*$ (here we use the star to identify this ground truth SCM and distinguish it from other CBNs or BNs). However, we do not work directly with the SCM $\mathcal{M}^*$. For us, the SCM is a generator of BNs (or CBNs). For us, in a sense, a SCM is a more expressive object than a CBN or a BN because it is actually a set of CBNs or BNs; it is a set of rules for generating new models.
So, if we will want to consider consistency in the observational domain, we will take our SCM $\mathcal{M}^*$ and extract from it the base BN $\mathcal{M}$. If we will want to consider consistency in the interventional domain, and thus work with an intervention $\iota$, we will extract from $\mathcal{M}^*$ the model $\mathcal{M}_\iota$ which, after edge removal and structrual function replacement, is again a BN. If we want to consider consistency in the counterfactual domain, assuming we have performed an intervention $\iota$ and now we want to observe the effects had we performed $\bar{\iota}$, we will extract from $\mathcal{M}^*$ the model $\mathcal{M}_{\iota\bar{\iota}}$ which, after factual intervention, abduction on the exogenous variables and counterfactual intervention [Pearl2009], provides us with a BN.
Everytime we will have to evaluate one form of consistency (marginal, joint, conditional) we will therefore just evaluate it in the relative BN assuming it can be represented in $\mathtt{FinStoch}$ and that commutativity is well-defined.
Observational consistencies are consistencies evaluated out of the observational model $\mathcal{M}$ on which no intervention has been applied. Both the base model $\mathcal{M}$ and the abstracted model $\mathcal{M'}$ are untouched.
In marginal observational consistency we are interested in considering the alignment of a single-variable marginal in the abstracted model with respect to the (possibly multi-varied) marginal in the base model.
Here our focus is focus on: $P_\mathcal{M'}(X)$
Consider, for instance, the following generic scenario in which two variables $A,B$ from $\mathcal{M}$ are mapped onto $X$ in $\mathcal{M'}$:
We now want to consider whether the abstraction $\alpha_X$ going from $A,B$ to $X$ guarantees consistency. We can then build the following diagram from $\mathcal{M}$ and $\mathcal{M'}$:
$$ \begin{array}{ccc} \{*\}& \overset{p_{\mathcal{M}}(A,B)}{\rightarrow} & A,B \\ {\downarrow}id& &\alpha_{X}{\downarrow} \\ \{*\}& \overset{p_{\mathcal{M'}}(X)}{\rightarrow} & X \end{array} $$where we derive $p_{\mathcal{M}}(A,B)$ and $p_{\mathcal{M'}}(X)$ as matrices corresponding to the desired distributions in $\mathcal{M}$ and $\mathcal{M'}$. Consistency reduces to verifying:
$$\alpha_X \cdot p_{\mathcal{M}}(A,B) = p_{\mathcal{M'}}(X).$$Notice that, in the scenario above, evaluating marginal observation consistency wrt only to $A$ or only $B$ is not immediate, as it would likely violate the requirement of a binary $\alpha$.
Meaning of marginal observational consistency. If commutativity holds, this means that the following two operations produce the same statistical result:
Again, beware that consistency does not tell us anything about information. Consistency may hold, and, still, $p_{\mathcal{M}}(A,B)$ may be very informative and be defined on a domain with high resolution, while $p_{\mathcal{M'}}(X)$ may be very simple and trivial.
In joint observational consistency we consider the distribution over all the variables (or a subset) in the abstracted model. This may be considered as an extension of the case above.
Here our focus is on: $P_\mathcal{M'}(\mathbf{X})$.
The corresponding scenario is the one in which we consider all the abstracted variables (or a subset) $\mathbf{X}$ and all the relevant variabled in the base model $\mathbf{A}$. We can then build the following diagram from $\mathcal{M}$ and $\mathcal{M'}$:
$$ \begin{array}{ccc} \{*\}& \overset{p_{\mathcal{M}}(\mathbf{A})}{\rightarrow} & \mathbf{A} \\ & &\alpha_{\mathbf{X}}{\downarrow} \\ \{*\}& \overset{p_{\mathcal{M'}}(\mathbf{X})}{\rightarrow} & \mathbf{X} \end{array} $$where $\alpha_{\mathbf{X}} = \bigotimes_{X \in \mathbf{X}} \alpha_X$, and $p_{\mathcal{M}}(\mathbf{A})$ and $p_{\mathcal{M'}}(\mathbf{X})$ stand for joint distributions over the variables in $\mathcal{M}$ and $\mathcal{M'}$. To assess consistency, we want the following identity to hold:
$$\alpha_\mathbf{X} \cdot p_{\mathcal{M}}(\mathbf{A}) = p_{\mathcal{M'}}(\mathbf{X}).$$Meaning of joint observational consistency. The meaning of this consistency is similar to the previous one. If commutativity holds, then the following two operations produce the same statistical result:
This is thus an extension from a marginal distribution over one variable to a joint distribution over all variables. As before, consistency does not tell us anything about information preservation.
In conditional observational consistency we focus on how the behaviour of one (or a set of) abstracted variable conditioned on another (or a set of) abstracted variable aligns with the corresponding variables in the base model.
Here our focus is on: $P_\mathcal{M'}(\mathbf{Y} \vert \mathbf{X})$.
We refer to the generic scenario in which we want to evalute the behaviour of a set of variables $\mathbf{Y}$ wrt $\mathbf{X}$ in $\mathcal{M'}$ and evaluate their alignment to the respective sets $\mathbf{A}=a^{-1}(\mathbf{X})$ and $\mathbf{B}=a^{-1}(\mathbf{Y})$.
To evaluate if the abstraction preserves observational conditional consistency, we generate the following diagram from $\mathcal{M}$ and $\mathcal{M'}$:
$$ \begin{array}{ccc} \mathbf{A} & \overset{p_{\mathcal{M}}(\mathbf{B} \vert \mathbf{A})}{\rightarrow} & \mathbf{B} \\ \alpha_{\mathbf{X}}{\downarrow} & &\alpha_{\mathbf{Y}}{\downarrow} \\ \mathbf{X} & \overset{p_{\mathcal{M'}}(\mathbf{Y} \vert \mathbf{X})}{\rightarrow} & \mathbf{Y} \end{array} $$where we derive $p_{\mathcal{M}}(\mathbf{B} \vert \mathbf{A})$ and $p_{\mathcal{M'}}(\mathbf{Y} \vert \mathbf{X})$ from $\mathcal{M}$ and $\mathcal{M'}$. We can now ask whether:
$$\alpha_\mathbf{Y} \cdot p_{\mathcal{M}}(\mathbf{B} \vert \mathbf{A}) = p_{\mathcal{M'}}(\mathbf{Y} \vert \mathbf{X}) \cdot \alpha_{\mathbf{X}}.$$Meaning of conditional observational consistency. In the case of conditioning, if commutativity holds, then the following two operations produce the same statistical result:
As always, the actual granularity of this prediction process is not accounted by consistency.
Interventional consistencies are consistencies evaluated out of an interventional model $\mathcal{M}_\iota$, for an intervention $\iota$.
In marginal interventional consistency we are interested in considering the alignment of a single-variable marginal in the abstracted model with respect to the (possibly multi-varied) marginal in the base model when we perform an intervention.
Here our focus is on: $P_\mathcal{M'}(Y \vert do(\mathbf{X}))$.
In this scenario we consider a single abstracted variable $Y$ in the post-interventional model generated by executing $do(\mathbf{X})$. Through the definition of abstraction, we can identify a corresponding (set) of variables $\mathbf{B}=a^{-1}(Y)$ of interest and a corresponding set of intervened variables $\mathbf{A}=a^{-1}(\mathbf{X})$.
We can then build a diagram from the post-interventional models $\mathcal{M}_\iota$ and $\mathcal{M'}_{\iota'}$ as follows:
$$ \begin{array}{ccc} \{*\}& \overset{p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B})}{\rightarrow} & \mathbf{B} \\ & &\alpha_{Y}{\downarrow} \\ \{*\}& \overset{p_{\mathcal{M'}_{do(\mathbf{X})}}(Y)}{\rightarrow} & Y \end{array} $$where $p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B})$ and $p_{\mathcal{M'}_{do(\mathbf{X})}}(Y)$ are matrices corresponding to the desired distributions computed from $\mathcal{M}_{\iota}$ and $\mathcal{M'}_{\iota'}$. Consistency reduces to verifying:
$$\alpha_{Y} \cdot p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}) = p_{\mathcal{M'}_{do(\mathbf{X})}}(Y).$$Notice that the two interventional quantities that we estimate on the post-interventional model may be expressed as quantities on the pre-interventional model using the $do$ notation:
$$ p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}) = p_{\mathcal{M}}(\mathbf{B} \vert do(\mathbf{A})) $$$$ p_{\mathcal{M'}_{do(\mathbf{X})}}(Y) = p_{\mathcal{M'}}(Y \vert do(\mathbf{X})).$$This may suggest that we could work with the following diagram:
$$ \begin{array}{ccc} \mathbf{A} & \overset{p_{\mathcal{M}}(\mathbf{B} \vert do(\mathbf{A}))}{\rightarrow} & \mathbf{B} \\ \alpha_{\mathbf{X}}{\downarrow} & &\alpha_{Y}{\downarrow} \\ \mathbf{X} & \overset{p_{\mathcal{M'}}(\mathbf{Y} \vert do(\mathbf{X}))}{\rightarrow} & Y \end{array} $$and evaluate $\alpha_Y \cdot p_{\mathcal{M}}(\mathbf{B} \vert do(\mathbf{A})) = p_{\mathcal{M'}}(\mathbf{Y} \vert do(\mathbf{X})) \cdot \alpha_{\mathbf{X}}$, as in [Rischel2020]. This notation has some advantages and disadvantages:
Meaning of marginal interventional consistency. If commutativity holds, this means that the following two operations produce the same statistical result:
In other words, starting from $\mathcal{M}$ and $\mathcal{M'}$ and performing $do(\mathbf{A})$ and $do(\mathbf{X})$, then the two operations above are (statistically) equivalent.
Joint interventional consistency extends marginal interventional consistency by considering all (or a subset of) the variables in the abstracted model and partitioning them in intervened and non-intervened variables.
Here our focus is on: $P_\mathcal{M'}(\mathbf{Y} \vert do(\mathbf{X}))$.
Assume we partition all the variables in the abstracted model in two sets: a set of variables $\mathbf{X}$ on which we will intervene, and a set of variables $\mathbf{Y}$ on which we do not intervene. As an immediated consequence, we will get through $a^{-1}$ a set of intervened variables $\mathbf{A} = a^{-1}(\mathbf{X})$ and a set of non-intervened variables $\mathbf{B} = a^{-1}(\mathbf{Y})$. (Remember that $\mathbf{A}$ and $\mathbf{B}$ do not necessarily form a partition of the variables in the base model since there may be non-relevant variables outside the domain of $a$).
We can then build a diagram similar to the previous one:
$$ \begin{array}{ccc} \{*\}& \overset{p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B})}{\rightarrow} & \mathbf{B} \\ & &\alpha_{\mathbf{Y}}{\downarrow} \\ \{*\}& \overset{p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y})}{\rightarrow} & \mathbf{Y} \end{array} $$where $p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B})$ and $p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y})$ are matrices corresponding to the desired distributions computed from $\mathcal{M}_{\iota}$ and $\mathcal{M'}_{\iota'}$. Consistency requires verifying:
$$\alpha_{\mathbf{Y}} \cdot p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}) = p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y}).$$Similarly to what we have done before, we can express our post-interventional distribution on the pre-interventional model using the $do$ notation:
$$ p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}) = p_{\mathcal{M}}(\mathbf{B} \vert do(\mathbf{A})) $$$$ p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y}) = p_{\mathcal{M'}}(\mathbf{Y} \vert do(\mathbf{X})),$$and consider an alternative diagram as we did before for marginal interventional consistency.
Meaning of joint interventional consistency. If commutativity holds, this means that the following two operations produce the same statistical result:
Joint interventional consistency is analogous to marginal interventional consistency except we consider multivariate distributions instead of univariate.
In conditional interventional consistency we are interested in considering the alignment of a set of variables conditioned on another set in the abstracted model wrt the corresponding sets in the base model, under the execution of an intervention.
Here our focus is on: $P_\mathcal{M'}(\mathbf{Y} \vert do(\mathbf{X}),\mathbf{Z})$.
In this scenario we assume that we have a set $\mathbf{Y}$ of abstracted variables that we want to study, a set $\mathbf{X}$ on which we intervene and a set $\mathbf{Z}$ which we condition on. Via $a^{-1}$ we get corresponding sets on the base model, respecively $\mathbf{B}$, $\mathbf{A}$ and $\mathbf{C}$.
We then build the following diagram:
$$ \begin{array}{ccc} \mathbf{C}& \overset{p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}\vert\mathbf{C})}{\rightarrow} & \mathbf{B} \\ \alpha_{\mathbf{Z}}{\downarrow}& &\alpha_{\mathbf{Y}}{\downarrow} \\ \mathbf{Z}& \overset{p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y}\vert\mathbf{Z})}{\rightarrow} & \mathbf{Y} \end{array} $$where $p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}\vert\mathbf{C})$ and $p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y}\vert\mathbf{Z})$ are matrices corresponding to the desired distributions computed from $\mathcal{M}_{\iota}$ and $\mathcal{M'}_{\iota'}$. Consistency requires verifying:
$$\alpha_{\mathbf{Y}} \cdot p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}\vert\mathbf{C}) = p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y}\vert\mathbf{Z})\cdot\alpha_{\mathbf{X}}.$$Similarly to what we have done before, we can express our post-interventional distribution on the pre-interventional model using the $do$ notation:
$$ p_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B}\vert\mathbf{C}) = p_{\mathcal{M}}(\mathbf{B} \vert do(\mathbf{A}),\mathbf{C}) $$$$ p_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y}\vert\mathbf{Z}) = p_{\mathcal{M'}}(\mathbf{Y} \vert do(\mathbf{X}),\mathbf{Z});$$however, building up an alternative diagram is not straightforward anymore as we now have a conditioning and an intervening set of variables.
Meaning of conditional observational consistency. When commutativity holds under intervention, then we can state that the following two operations produce the same statistical result:
Counterfactual consistencies are consistencies evaluated out of a counterfactual model $\mathcal{M}_{\iota\bar{\iota}}$ for conflicting interventions $\iota$ and $\bar{\iota}$ that force us to reason about two distinct worlds, a factual one and a counterfactual one.
As briefly discussed above, we can not reason about counterfactula relying on our objects and morphisms in $\mathtt{FinStoch}$, as they represent BNs (or at best CBNs). We need instead to step back to our SCM and use it to generate counterfactual models $\mathcal{M}_{\iota\bar{\iota}}$.
In marginal counterfactual consistency we are interested in considering the alignment of a single-variable marginal in the abstracted model with respect to the (possibly multi-varied) marginal in the base model when we perform a counterfactual study.
Here our focus is on: $P_{\mathcal{M}_{\iota\bar{\iota}}}(Y) = P_{\mathcal{M'}_{do(\mathbf{X})}}(Y \vert do(\bar{\mathbf{X}}))$.
In this scenario we consider a single abstracted variable $Y$; in the factual world we have performed the intervention $do(\mathbf{X})$ and observed $Y$; now we want to consider what would have been the effect $Y$ had we performed $do(\bar{\mathbf{X}})$. Notice that, despite the notation, $do(\bar{\mathbf{X}})$ should not be read as a Boolean negation of the intervention $do(\mathbf{X})$; instead the set $\bar{\mathbf{X}}$ simply denotes a set of variables on which we want to evaluate a counterfactual intervention, such that some of the manipulations are in contradiction with the factual intervention $do(\mathbf{X})$.
To do our counterfactual evaluation, we first take an abduction step in $\mathcal{M'}_{do(\mathbf{X})}$: from the observed $Y$ we infer the distribution over the exogenous variables (this abduction may be exact or produce a distribution). Next, we fix these distributions and we perform the desired intervention $\bar{\iota}$, thus producing $\mathcal{M'}_{\iota'\bar{\iota'}}$. Finally we evaluate $P_{\mathcal{M'}_{do(\mathbf{X})}}(Y \vert do(\bar{\mathbf{X}}))$.
Via abstraction $a$, we have a corresponding counterfactual quantity in the low-level model. Let $\mathbf{A}=a^{-1}(\mathbf{X})$, $\mathbf{\bar{A}}=a^{-1}(\mathbf{\bar{X}})$, and $\mathbf{B}=a^{-1}(Y)$. We start from an abduction in the factual base intervened model, by inferring the distribution over the exogenous variables; then we perform the intervention $do(\mathbf{\bar{A}})$; last we evaluate $P_{\mathcal{M}_{do(\mathbf{A})}}(\mathbf{B} \vert do(\bar{\mathbf{A}}))$.
The diagram we are interested in is built from the counterfactual models $\mathcal{M}_{\iota\bar{\iota}}$ and $\mathcal{M'}_{\iota'\bar{\iota'}}$ as follows:
$$ \begin{array}{ccc} \{*\}& \overset{p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B})}{\rightarrow} & \mathbf{B} \\ & &\alpha_{Y}{\downarrow} \\ \{*\}& \overset{p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(Y)}{\rightarrow} & Y \end{array} $$where $p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B})$ and $p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(Y)$ are matrices corresponding to the desired distributions computed from $\mathcal{M}_{\iota\bar{\iota}}$ and $\mathcal{M'}_{\iota'\bar{\iota'}}$. Consistency reduces to verifying:
$$\alpha_{Y} \cdot p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}) = p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(Y).$$Counterfactual quantities, like interventional ones, may be expressed in a different form. The two interventional quantities that we estimate on the post-intervened model may be expressed as quantities on the pre-interventional model using the $do$ notation:
$$ p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}) = p_{\mathcal{M}}(B\vert do(\mathbf{A}),do(\bar{\mathbf{A}})) $$$$ p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(Y) = p_{\mathcal{M'}}(Y\vert do(\mathbf{X}),do(\bar{\mathbf{X}})).$$Representing in a single diagram the contradictory intervention in the conditioning part is however not very helpful.
Meaning of marginal counterfactual consistency. If commutativity holds, this means that the following two operations produce the same statistical result:
Counterfactual consistency means that the model not only guarantees a generic statistical behaviour (as in the case of observational and interventional consistency), but, more strictly, that results on units (or subjects) are (statistically) equivalent.
Joint counterfactual consistency simply extends marginal counterfactual consistency by taking into consideration subsets of variables in the abstracted model, partitioning them in target variables of interest, facutal intervened variables and counterfactual intervened variables.
Here our focus is on: $P_{\mathcal{M}_{\iota\bar{\iota}}}(\mathbf{Y}) = P_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y} \vert do(\bar{\mathbf{X}}))$.
Assume we divide all the variables in the abstracted model in three sets: a set of variables $\mathbf{Y}$ on which we do not intervene, a set of variables $\mathbf{X}$ on which we will factually intervene, and a set of variables $\bar{\mathbf{X}}$, which overlaps with ${\mathbf{X}}$, on which we will counterfactually intervene. As always, the abstraction will allow us to compute corresponding counterfactual quantity via $a^{-1}$ in the base model.
We can then build a diagram similar to the previous one:
$$ \begin{array}{ccc} \{*\}& \overset{p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B})}{\rightarrow} & \mathbf{B} \\ & &\alpha_{\mathbf{Y}}{\downarrow} \\ \{*\}& \overset{p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y})}{\rightarrow} & \mathbf{Y} \end{array} $$where $p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B})$ and $p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y})$ are matrices corresponding to the desired distributions computed from $\mathcal{M}_{\iota\bar{\iota}}$ and $\mathcal{M'}_{\iota'\bar{\iota'}}$. Consistency reduces to verifying:
$$\alpha_{\mathbf{Y}} \cdot p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}) = p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y}).$$We can also try to express the counterfactual quantities via the following identities:
$$ p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}) = p_{\mathcal{M}}(\mathbf{B}\vert do(\mathbf{A}),do(\bar{\mathbf{A}})) $$$$ p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y}) = p_{\mathcal{M'}}(\mathbf{Y}\vert do(\mathbf{X}),do(\bar{\mathbf{X}})).$$Meaning of joint counterfactual consistency. If commutativity holds this means that the following two operations produce the same statistical result:
Joint counterfactual consistency is then again analogous to marginal counterfactual consistency once we account for a multivariate distribution.
In conditional counterfactual consistency we are interested in considering the alignment of a set of variables conditioned on another set in the abstracted model with respect to the corresponding sets in the base model, under the performance of a counterfactual study.
Here our focus is on: $P_{\mathcal{M}_{\iota\bar{\iota}}}(\mathbf{Y}\vert \mathbf{Z}) = P_{\mathcal{M'}_{do(\mathbf{X})}}(\mathbf{Y} \vert do(\bar{\mathbf{X}}),\mathbf{Z})$.
In this scenario we have a set of variables $\mathbf{Y}$ on which we do not intervene, a set of variables $\mathbf{X}$ on which we will factually intervene, a set of variables $\bar{\mathbf{X}}$, which overlaps with ${\mathbf{X}}$, on which we will counterfactually intervene, and a set of conditioning variables $\mathbf{Z}$.
Given the abstracted model, in the factual world we perform the intervention $do(\mathbf{X})$ and we study $\mathbf{Y}$ conditioned on $\mathbf{Z}$; in the counterfactual study we perform the usual abduction step, we perform $do(\bar{\mathbf{X}})$, and we observe the counterfactual distribution on $\mathbf{Y}$ conditioned on $\mathbf{Z}$. The same evaluation mediated by the abstraction term $a$ happens on the base model.
The diagram we are interested in is built from the counterfactual models $\mathcal{M}_{\iota\bar{\iota}}$ and $\mathcal{M'}_{\iota'\bar{\iota'}}$ as follows:
$$ \begin{array}{ccc} \mathbf{C}& \overset{p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}\vert\mathbf{C})}{\rightarrow} & \mathbf{B} \\ \alpha_{\mathbf{Z}}{\downarrow}& &\alpha_{\mathbf{Y}}{\downarrow} \\ \mathbf{Z}& \overset{p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y}\vert\mathbf{Z})}{\rightarrow} & \mathbf{Y} \end{array} $$where $p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}\vert\mathbf{C})$ and $p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y}\vert\mathbf{Z})$ are matrices corresponding to the desired distributions computed from $\mathcal{M}_{\iota\bar{\iota}}$ and $\mathcal{M'}_{\iota'\bar{\iota'}}$. Consistency reduces to verifying:
$$\alpha_{\mathbf{Y}} \cdot p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}\vert\mathbf{C}) = p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y}\vert\mathbf{Z}) \cdot \alpha_{\mathbf{Z}}.$$We can express the counterfactual quantities via the following identities:
$$ p_{\mathcal{M}_{do(\mathbf{A}),do(\bar{\mathbf{A}})}}(\mathbf{B}\vert\mathbf{C}) = p_{\mathcal{M}}(\mathbf{B}\vert do(\mathbf{A}),do(\bar{\mathbf{A}}),\mathbf{C}) $$$$ p_{\mathcal{M'}_{do(\mathbf{X}),do(\bar{\mathbf{X}})}}(\mathbf{Y}\vert\mathbf{Z}) = p_{\mathcal{M'}}(\mathbf{Y}\vert do(\mathbf{X}),do(\bar{\mathbf{X}}),\mathbf{Z}).$$Meaning of conditional counterfactual consistency. If commutativity holds, this means that the following two operations produce the same statistical result:
Conditional counterfactual consistency allows predicition and the evaluation of complex queries across contradictory worlds with the guarantee that results in the base and in the abstracted model will agree.
Notice that while we have defined conditional counterfactual consistency in the most general way, we can have specific queries where conditioning may happen only on the factual or on the counterfactual world.
First, we have a look at the simpler forms of observational consistency and interventional consistency. We instantiate a basic model ($\mathcal{M}$) and several different abstracted models ($\mathcal{M'}$). In every instance, we compute observational and interventional distributions of interest and compare them.
We start defining the base model. Differently from previous examples in other notebooks where we usually considered a chain model, we now instantiate a model with a confounder. This will make it easier to highlight the difference between observational and interventional consistency.
M0 = BN([('Env','Smoke'),('Env','Cancer'),('Smoke','Cancer')])
cpdE = cpd(variable='Env',
variable_card=2,
values=[[.25],[.75]],
evidence=None,
evidence_card=None)
cpdS = cpd(variable='Smoke',
variable_card=2,
values=[[.8,.6],[.2,.4]],
evidence=['Env'],
evidence_card=[2])
cpdC = cpd(variable='Cancer',
variable_card=2,
values=[[.9,.8,.35,.3],[.1,.2,.65,.7]],
evidence=['Smoke','Env'],
evidence_card=[2,2])
M0.add_cpds(cpdE,cpdS,cpdC)
M0.check_model()
True
Notice that, by definition, pgmpy forces us work with a BN (actually, a CBN since we can perform interventions); we can then easily use pgmpy objects for dealing with observational and interventional consistency, but they will not allow us to deal with counterfactual consistency in an immediate way.
We then use pgmpy methods to evaluate distributions of interest.
inferM0 = VariableElimination(M0)
M0_P_ESC = inferM0.query(['Env','Smoke','Cancer'], show_progress=False)
M0_P_SC = inferM0.query(['Smoke','Cancer'], show_progress=False)
M0_P_S = inferM0.query(['Smoke'], show_progress=False)
M0_P_C = inferM0.query(['Cancer'], show_progress=False)
M0_P_C_givenS = M0_P_SC/M0_P_S
M0do = M0.do(['Smoke'])
infer0do = VariableElimination(M0do)
M0_P_C_doS0 = infer0do.query(['Cancer'], evidence={'Smoke':0}, show_progress=False)
M0_P_C_doS1 = infer0do.query(['Cancer'], evidence={'Smoke':1}, show_progress=False)
In the following we will define a series of abstracted models, each one guaranteeing different forms of consistency. For computational simplicity we will assume that all abstractions are identities.
We also define a simple helper function to compute abstracted mechanisms.
def solve_eq(p00,p01,p10,p11):
lmbda = p00+p01
x = p00/lmbda
y = 1-x
z = p10 / (1-lmbda)
w = 1-z
return np.array([[lmbda],[1-lmbda]]), np.array([[x,z],[y,w]])
s,c = solve_eq(0.54,0.11,0.1075,0.2425)
M1 = BN([('Smoke','Cancer')])
cpdS = cpd(variable='Smoke',
variable_card=2,
values=s,
evidence=None,
evidence_card=None)
cpdC = cpd(variable='Cancer',
variable_card=2,
values=c,
evidence=['Smoke'],
evidence_card=[2])
M1.add_cpds(cpdS,cpdC)
M1.check_model()
True
We evaluate distributions on the abstracted model using pgmpy methods, and compare them with the base model.
inferM1 = VariableElimination(M1)
M1_P_SC = inferM1.query(['Smoke','Cancer'], show_progress=False)
M1_P_S = inferM1.query(['Smoke'], show_progress=False)
M1_P_C = inferM1.query(['Cancer'], show_progress=False)
M1_P_C_givenS = M1_P_SC/M1_P_S
M1do = M1.do(['Smoke'])
infer1do = VariableElimination(M1do)
M1_P_C_doS0 = infer1do.query(['Cancer'], evidence={'Smoke':0}, show_progress=False)
M1_P_C_doS1 = infer1do.query(['Cancer'], evidence={'Smoke':1}, show_progress=False)
print('Joint observational consistency on P(S,C): {0}'.format(M0_P_SC == M1_P_SC))
if verbose: print('{0} \n {1}'.format(M0_P_SC,M1_P_SC))
Joint observational consistency on P(S,C): True
print('Joint observational consistency on P(S): {0}'.format(M0_P_S == M1_P_S))
if verbose: print('{0} \n {1}'.format(M0_P_S,M1_P_S))
Joint observational consistency on P(S): True
print('Joint observational consistency on P(C): {0}'.format(M0_P_C == M1_P_C))
if verbose: print('{0} \n {1}'.format(M0_P_C,M1_P_C))
Joint observational consistency on P(C): True
print('Conditional observational consistency on P(C|S): {0}'.format(M0_P_C_givenS == M1_P_C_givenS))
if verbose: print('{0} \n {1}'.format(M0_P_C_givenS,M1_P_C_givenS))
Conditional observational consistency on P(C|S): True
print('Joint interventional consistency on P(C|do(S=0)): {0}'.format(M0_P_C_doS0 == M1_P_C_doS0))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS0,M1_P_C_doS0))
Joint interventional consistency on P(C|do(S=0)): False
print('Joint interventional consistency on P(C|do(S=1)): {0}'.format(M0_P_C_doS1 == M1_P_C_doS1))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS1,M1_P_C_doS1))
Joint interventional consistency on P(C|do(S=1)): False
M2 = BN([('Smoke','Cancer')])
cpdS = cpd(variable='Smoke',
variable_card=2,
values=[[.5],[.5]],
evidence=None,
evidence_card=None)
cpdC = cpd(variable='Cancer',
variable_card=2,
values=c,
evidence=['Smoke'],
evidence_card=[2])
M2.add_cpds(cpdS,cpdC)
M2.check_model()
True
We evaluate distributions on the abstracted model using pgmpy methods, and compare them with the base model.
inferM2 = VariableElimination(M2)
M2_P_SC = inferM2.query(['Smoke','Cancer'], show_progress=False)
M2_P_S = inferM2.query(['Smoke'], show_progress=False)
M2_P_C = inferM2.query(['Cancer'], show_progress=False)
M2_P_C_givenS = M2_P_SC/M2_P_S
M2do = M2.do(['Smoke'])
infer2do = VariableElimination(M2do)
M2_P_C_doS0 = infer2do.query(['Cancer'], evidence={'Smoke':0}, show_progress=False)
M2_P_C_doS1 = infer2do.query(['Cancer'], evidence={'Smoke':1}, show_progress=False)
print('Joint observational consistency on P(S,C): {0}'.format(M0_P_SC == M2_P_SC))
if verbose: print('{0} \n {1}'.format(M0_P_SC,M2_P_SC))
Joint observational consistency on P(S,C): False
print('Joint observational consistency on P(S): {0}'.format(M0_P_S == M2_P_S))
if verbose: print('{0} \n {1}'.format(M0_P_S,M2_P_S))
Joint observational consistency on P(S): False
print('Joint observational consistency on P(C): {0}'.format(M0_P_C == M2_P_C))
if verbose: print('{0} \n {1}'.format(M0_P_C,M2_P_C))
Joint observational consistency on P(C): False
print('Conditional observational consistency on P(C|S): {0}'.format(M0_P_C_givenS == M2_P_C_givenS))
if verbose: print('{0} \n {1}'.format(M0_P_C_givenS,M2_P_C_givenS))
Conditional observational consistency on P(C|S): True
print('Joint interventional consistency on P(C|do(S=0)): {0}'.format(M0_P_C_doS0 == M2_P_C_doS0))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS0,M2_P_C_doS0))
Joint interventional consistency on P(C|do(S=0)): False
print('Joint interventional consistency on P(C|do(S=1)): {0}'.format(M0_P_C_doS1 == M2_P_C_doS1))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS1,M2_P_C_doS1))
Joint interventional consistency on P(C|do(S=1)): False
M3 = BN([('Smoke','Cancer')])
cpdS = cpd(variable='Smoke',
variable_card=2,
values=s,
evidence=None,
evidence_card=None)
cpdC = cpd(variable='Cancer',
variable_card=2,
values=[[.8,.4],[.2,.6]],
evidence=['Smoke'],
evidence_card=[2])
M3.add_cpds(cpdS,cpdC)
M3.check_model()
True
We evaluate distributions on the abstracted model using pgmpy methods, and compare them with the base model.
inferM3 = VariableElimination(M3)
M3_P_SC = inferM3.query(['Smoke','Cancer'], show_progress=False)
M3_P_S = inferM3.query(['Smoke'], show_progress=False)
M3_P_C = inferM3.query(['Cancer'], show_progress=False)
M3_P_C_givenS = M3_P_SC/M3_P_S
M3do = M3.do(['Smoke'])
infer3do = VariableElimination(M3do)
M3_P_C_doS0 = infer3do.query(['Cancer'], evidence={'Smoke':0}, show_progress=False)
M3_P_C_doS1 = infer3do.query(['Cancer'], evidence={'Smoke':1}, show_progress=False)
print('Joint observational consistency on P(S,C): {0}'.format(M0_P_SC == M3_P_SC))
if verbose: print('{0} \n {1}'.format(M0_P_SC,M3_P_SC))
Joint observational consistency on P(S,C): False
print('Joint observational consistency on P(S): {0}'.format(M0_P_S == M3_P_S))
if verbose: print('{0} \n {1}'.format(M0_P_S,M3_P_S))
Joint observational consistency on P(S): True
print('Joint observational consistency on P(C): {0}'.format(M0_P_C == M3_P_C))
if verbose: print('{0} \n {1}'.format(M0_P_C,M3_P_C))
Joint observational consistency on P(C): False
print('Conditional observational consistency on P(C|S): {0}'.format(M0_P_C_givenS == M3_P_C_givenS))
if verbose: print('{0} \n {1}'.format(M0_P_C_givenS,M3_P_C_givenS))
Conditional observational consistency on P(C|S): False
print('Joint interventional consistency on P(C|do(S=0)): {0}'.format(M0_P_C_doS0 == M3_P_C_doS0))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS0,M3_P_C_doS0))
Joint interventional consistency on P(C|do(S=0)): False
print('Joint interventional consistency on P(C|do(S=1)): {0}'.format(M0_P_C_doS1 == M3_P_C_doS1))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS1,M3_P_C_doS1))
Joint interventional consistency on P(C|do(S=1)): False
M4 = BN([('Smoke','Cancer')])
cpdS = cpd(variable='Smoke',
variable_card=2,
values=s,
evidence=None,
evidence_card=None)
cpdC = cpd(variable='Cancer',
variable_card=2,
values=[[.825,0.3125],[.175,0.6875]],
evidence=['Smoke'],
evidence_card=[2])
M4.add_cpds(cpdS,cpdC)
M4.check_model()
True
We evaluate distributions on the abstracted model using pgmpy methods, and compare them with the base model.
inferM4 = VariableElimination(M4)
M4_P_SC = inferM4.query(['Smoke','Cancer'], show_progress=False)
M4_P_S = inferM4.query(['Smoke'], show_progress=False)
M4_P_C = inferM4.query(['Cancer'], show_progress=False)
M4_P_C_givenS = M4_P_SC/M4_P_S
M4do = M4.do(['Smoke'])
infer4do = VariableElimination(M4do)
M4_P_C_doS0 = infer4do.query(['Cancer'], evidence={'Smoke':0}, show_progress=False)
M4_P_C_doS1 = infer4do.query(['Cancer'], evidence={'Smoke':1}, show_progress=False)
print('Joint observational consistency on P(S,C): {0}'.format(M0_P_SC == M4_P_SC))
if verbose: print('{0} \n {1}'.format(M0_P_SC,M4_P_SC))
Joint observational consistency on P(S,C): False
print('Joint observational consistency on P(S): {0}'.format(M0_P_S == M4_P_S))
if verbose: print('{0} \n {1}'.format(M0_P_S,M4_P_S))
Joint observational consistency on P(S): True
print('Joint observational consistency on P(C): {0}'.format(M0_P_C == M4_P_C))
if verbose: print('{0} \n {1}'.format(M0_P_C,M4_P_C))
Joint observational consistency on P(C): False
print('Conditional observational consistency on P(C|S): {0}'.format(M0_P_C_givenS == M4_P_C_givenS))
if verbose: print('{0} \n {1}'.format(M0_P_C_givenS,M4_P_C_givenS))
Conditional observational consistency on P(C|S): False
print('Joint interventional consistency on P(C|do(S=0)): {0}'.format(M0_P_C_doS0 == M4_P_C_doS0))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS0,M4_P_C_doS0))
Joint interventional consistency on P(C|do(S=0)): True
print('Joint interventional consistency on P(C|do(S=1)): {0}'.format(M0_P_C_doS1 == M4_P_C_doS1))
if verbose: print('{0} \n {1}'.format(M0_P_C_doS1,M4_P_C_doS1))
Joint interventional consistency on P(C|do(S=1)): True
Finally, let us run a few simulations to investigate counterfactual consistency. Instead of pgmpy, we will now use custom classes that will allow us to evaluate counterfactual distributions of interest.
We now revert to the old chain model $\mathcal{M}$ representing a simplified lung cancer model; for a full description, refer to the first notebook.
class M0():
def __init__(self):
self.scm = True
def sample(self):
Us = scipy.random.binomial(1,.2)
Ut = scipy.random.binomial(1,.8)
Uc1 = scipy.random.binomial(1,.1)
Uc2 = scipy.random.binomial(1,.4)
S = Us
T = S*Ut
C = Uc1*(1-T)+Uc2*T
return S,T,C,Us,Ut,Uc1,Uc2
def counterfactual_sample(self):
S,T,C,Us,Ut,Uc1,Uc2 = self.sample()
while not(np.logical_and(S==1,C==1)):
S,T,C,Us,Ut,Uc1,Uc2 = self.sample()
S = 0
T = S*Ut
C = Uc1*(1-T)+Uc2*T
return S,T,C
Differently from the definition used in the previous notebooks, we have now implemented the model as a true SCM, complete with all the necessary exogenous variable. The form we have chosen for the distributions over exogenous nodes and the structural equations in the endogenous node must guarantee that the pushforward generates the conditional distributions we want, but it is otherwise arbitrary.
It is easy to show that the pushforward of the distributions over the exogenous nodes onto the endogenous nodes generates the distributions/mechanisms/stochastic matrices we want. For the variable $S$ we have:
$$ P(S=1) = P(U_S=1) = .2, $$which entails the mechanism: $\mathcal{M}[\phi_S] = \left[\begin{array}{c} .8 \\ .2 \end{array}\right]$. For the variable $T$ we have:
$$ P(T=1 \vert S=0) = 0 $$$$ P(T=1 \vert S=1) = P(U_T=1) = .8, $$which entails the mechanism: $\mathcal{M}[\phi_T] = \left[\begin{array}{cc} 1 & .2\\ 0 & .8 \end{array}\right]$. Finally for $C$ we can compute:
$$ P(C=1 \vert T=0) = P(U_{c1}=1) = .1 $$$$ P(C=1 \vert T=1) = P(U_{c2}=1) = .6, $$which correponds to the mechanism: $\mathcal{M}[\phi_C] = \left[\begin{array}{cc} .9 & .4\\ .1 & .6 \end{array}\right]$.
All the mechanisms do correspond to the desired ones (as we uded them in the previous notebooks).
Notice also that our class has an additional function: counterfactual_sample(). This function is supposed to be used to answer the question: what would be the probability of not developing cancer ($C=0$) if a patient had not smoked ($S=0$) when, in fact, we observed that the patient smoked ($S=1$) and developed cancer ($C=1$)?. Practically, what this function does is generating in a loop factual samples of smoking patients with cancer ($S=1,C=1$), record the value of the exogenous variables (equivalent to abduction), start a new simulation setting the value of the exogenous variables to the recorded value, intervene on the smoke variable $do(S=0)$, and observe the outcome ($C$). This is, of course, just one among many counterfactual quantities we may want to study.
Following again the standard example, we consider a smaller abstracted model $\mathcal{M'}$ composed of just two variables; again, for a full description, refer to the first notebook.
class M1():
def __init__(self):
self.scm = True
def sample(self):
Us = scipy.random.binomial(1,.2)
Uc1 = scipy.random.binomial(1,.1)
Uc2 = scipy.random.binomial(1,.34)
S = Us
C = Uc1*(1-S)+Uc2*S
return S,C,Us,Uc1,Uc2
def counterfactual_sample(self):
S,C,Us,Uc1,Uc2 = self.sample()
while not(np.logical_and(S==1,C==1)):
S,C,Us,Uc1,Uc2 = self.sample()
S = 0
C = Uc1*(1-S)+Uc2*S
return S,C
Again, this is a full SCM implementation with exogenous variables and their distributions, as well as endogenous variables and their structural functions.
It is immediate to pushforward the distributions over the exogenous nodes and derive the distributions/mechanisms/stochastic matrices over the endogenous nodes. For the variable $S'$ we have:
$$ P(S'=1) = P(U_{S'}=1) = .2, $$which entails the mechanism: $\mathcal{M'}[\phi_{S'}] = \left[\begin{array}{c} .8 \\ .2 \end{array}\right]$. For the variable $C'$ we get:
$$ P(C'=1 \vert S'=0) = P(U_{c1'}=1) = .1 $$$$ P(C'=1 \vert S'=1) = P(U_{c2'}=1) = .34, $$which correponds to the mechanism: $\mathcal{M'}[\phi_{C'}] = \left[\begin{array}{cc} .9 & .66\\ .1 & .34 \end{array}\right]$.
All the mechanisms are equal to the desired ones (as we seen in the previous notebooks).
Finally, we have the additional function counterfactual_sample() which is used to answer the corresponding question for the abstracted model: what would be the probability of not developing cancer ($C'=0$) if a patient had not smoked ($S'=0$) when, in fact, we observed that the patient smoked ($S'=1$) and developed cancer ($C'=1$)?. Following the same approach as before, this function generates in a loop factual samples of smoking patients with cancer ($S'=1,C'=1$), records the value of the exogenous variables (equivalent to abduction), starts a new simulation setting the value of the exogenous variables to the recorded value, intervenes on the smoke variable $do($S'=0$)$, and observes the outcome ($C$).
We relate the two models with the standard abstaction made up of two identity matrices relating the variables $S$ and $S'$, and $C$ and $C'$; the complete definition of the abstraction in terms of $R$, $a$ and $\alpha$ is available in the first notebook.
Having abstractions that are simple identities simplifies once again the verification of consistency; it will be sufficient to compare stochastic matrices (distributions) from the two models without the need to explicitly multiply them by the alpha matrix.
We already know that the two models $\mathcal{M}$ and $\mathcal{M'}$ are highly compatible; in particular we have seen that all the following consistencies hold:
A demonstration of these consistencies is available in the first notebook. In the following we just give a further empirical confirmation.
Let us sample from the base model:
m0 = M0()
data_m0 = np.zeros((n_samples,7))
for i in tqdm(range(n_samples)):
data_m0[i,:] = m0.sample()
100%|███████████████████████████████| 100000/100000 [00:00<00:00, 166468.58it/s]
And estimate a few distributions of interest.
print('Empirical distributions:')
print('m0: P(S=0) = {0}'.format(np.sum(data_m0[:,0]==0)/n_samples))
print('m0: P(S=1) = {0}'.format(np.sum(data_m0[:,0])/n_samples))
print('\nm0: P(T=0|S=0) = {0}'.format(np.sum(np.logical_and(data_m0[:,1]==0,data_m0[:,0]==0))/np.sum(data_m0[:,0]==0)))
print('m0: P(T=1|S=0) = {0}'.format(np.sum(np.logical_and(data_m0[:,1]==1,data_m0[:,0]==0))/np.sum(data_m0[:,0]==0)))
print('m0: P(T=0|S=1) = {0}'.format(np.sum(np.logical_and(data_m0[:,1]==0,data_m0[:,0]==1))/np.sum(data_m0[:,0]==1)))
print('m0: P(T=1|S=1) = {0}'.format(np.sum(np.logical_and(data_m0[:,1]==1,data_m0[:,0]==1))/np.sum(data_m0[:,0]==1)))
print('\nm0: P(C=0|T=0) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==0,data_m0[:,1]==0))/np.sum(data_m0[:,1]==0)))
print('m0: P(C=1|T=0) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==1,data_m0[:,1]==0))/np.sum(data_m0[:,1]==0)))
print('m0: P(C=0|T=1) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==0,data_m0[:,1]==1))/np.sum(data_m0[:,1]==1)))
print('m0: P(C=1|T=1) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==1,data_m0[:,1]==1))/np.sum(data_m0[:,1]==1)))
print('\nm0: P(C=0|S=0) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==0,data_m0[:,0]==0))/np.sum(data_m0[:,0]==0)))
print('m0: P(C=1|S=0) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==1,data_m0[:,0]==0))/np.sum(data_m0[:,0]==0)))
print('m0: P(C=0|S=1) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==0,data_m0[:,0]==1))/np.sum(data_m0[:,0]==1)))
print('m0: P(C=1|S=1) = {0}'.format(np.sum(np.logical_and(data_m0[:,2]==1,data_m0[:,0]==1))/np.sum(data_m0[:,0]==1)))
Empirical distributions: m0: P(S=0) = 0.80079 m0: P(S=1) = 0.19921 m0: P(T=0|S=0) = 1.0 m0: P(T=1|S=0) = 0.0 m0: P(T=0|S=1) = 0.2015461071231364 m0: P(T=1|S=1) = 0.7984538928768636 m0: P(C=0|T=0) = 0.899160463291079 m0: P(C=1|T=0) = 0.10083953670892097 m0: P(C=0|T=1) = 0.6042373946938262 m0: P(C=1|T=1) = 0.3957626053061738 m0: P(C=0|S=0) = 0.8993993431486407 m0: P(C=1|S=0) = 0.10060065685135929 m0: P(C=0|S=1) = 0.6627177350534612 m0: P(C=1|S=1) = 0.33728226494653885
Notice, in particular, that the conditional distributions correspond (within sampling approximation) to the desired mechanisms.
We do an analogous sampling for the abstracted model.
m1 = M1()
data_m1 = np.zeros((n_samples,5))
for i in tqdm(range(n_samples)):
data_m1[i,:] = m1.sample()
100%|███████████████████████████████| 100000/100000 [00:00<00:00, 219992.94it/s]
And take a look to some reference distributions:
print('Empirical distributions:')
print("m1: P(S'=0) = {0}".format(np.sum(data_m1[:,0]==0)/n_samples))
print("m1: P(S'=1) = {0}".format(np.sum(data_m1[:,0])/n_samples))
print("\nm1: P(C'=0|S'=0) = {0}".format(np.sum(np.logical_and(data_m1[:,1]==0,data_m1[:,0]==0))/np.sum(data_m1[:,0]==0)))
print("m1: P(C'=1|S'=0) = {0}".format(np.sum(np.logical_and(data_m1[:,1]==1,data_m1[:,0]==0))/np.sum(data_m1[:,0]==0)))
print("m1: P(C'=0|S'=1) = {0}".format(np.sum(np.logical_and(data_m1[:,1]==0,data_m1[:,0]==1))/np.sum(data_m1[:,0]==1)))
print("m1: P(C'=1|S'=1) = {0}".format(np.sum(np.logical_and(data_m1[:,1]==1,data_m1[:,0]==1))/np.sum(data_m1[:,0]==1)))
Empirical distributions: m1: P(S'=0) = 0.79915 m1: P(S'=1) = 0.20085 m1: P(C'=0|S'=0) = 0.9022961897015579 m1: P(C'=1|S'=0) = 0.09770381029844209 m1: P(C'=0|S'=1) = 0.6580532735872542 m1: P(C'=1|S'=1) = 0.3419467264127458
Notice that the conditional distributions match the desired mechanisms. Also, the interventional distribution on $\mathcal{M}$ and $\mathcal{M'}$ (which in this case is the same as the conditional distribution) agree, thus satisfying interventional consistency.
To evaluate whether the models satisfy (at least some form of) counterfactual consistency, we consider one specific query: what would be the probability of not developing cancer ($C=0$) if a patient had not smoked ($S=0$) when, in fact, we observed that the patient smoked ($S=1$) and developed cancer ($C=1$)?. This corresponds to:
$$P_{\mathcal{M}_{S=1,C=1}}(C=0 \vert do(S=0))$$and
$$P_{\mathcal{M'}_{S'=1,C'=1}}(C'=0 \vert do(S'=0))$$Model $\mathcal{M}$. Let us start with the first SCM, and consider its state when we observe $S=1,C=1$. From this state we want to infer the distribution of probability over the exogenous variables $U_S, U_T, U_{C1}, U_{C2}$ (abduction). To do so, we take the slightly long but illustrative approach from [Darwiche] of listing all possible worlds.
U_S | U_T | U_C1 | U_C2 | S | T | C | P |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0864 |
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.0576 |
0 | 0 | 1 | 0 | 0 | 0 | 1 | 0.0096 |
0 | 0 | 1 | 1 | 0 | 0 | 1 | 0.0064 |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.3456 |
0 | 1 | 0 | 1 | 0 | 0 | 0 | 0.2304 |
0 | 1 | 1 | 0 | 0 | 0 | 1 | 0.0384 |
0 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0256 |
1 | 0 | 0 | 0 | 1 | 0 | 0 | 0.0216 |
1 | 0 | 0 | 1 | 1 | 0 | 0 | 0.0144 |
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0.0024 |
1 | 0 | 1 | 1 | 1 | 0 | 1 | 0.0016 |
1 | 1 | 0 | 0 | 1 | 1 | 0 | 0.0864 |
1 | 1 | 0 | 1 | 1 | 1 | 1 | 0.0576 |
1 | 1 | 1 | 0 | 1 | 1 | 0 | 0.0096 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.0064 |
We have a table of 16 lines because of the sixteen possible combinations of the four exogenous variables $U_S, U_T, U_{C1}, U_{C2}$; endogenous variables $S,T,C$ are then deterministically set. The last column reports the probability of each setting, which, because of independence, is just the product of the probability of the outcome of each single exogenous variable.
Now, we want to consider only those worlds in which $S=1,C=1$, that is, the four lines:
U_S | U_T | U_C1 | U_C2 | S | T | C | P |
---|---|---|---|---|---|---|---|
1 | 0 | 1 | 0 | 1 | 0 | 1 | 0.0024 |
1 | 0 | 1 | 1 | 1 | 0 | 1 | 0.0016 |
1 | 1 | 0 | 1 | 1 | 1 | 1 | 0.0576 |
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.0064 |
From this, we can easily perform our abduction step for estimating the probability of the exogenous variables. In particular we get:
$$P(U_S = 1) = 1$$$$P(U_T = 1) = \frac{0.0576+0.0064}{Z}$$$$P(U_{C1} = 1) = \frac{0.0024+0.0016+0.0064}{Z}$$$$P(U_{C2} = 1) = \frac{0.0016+0.0576+0.0064}{Z}$$where $Z$ is just the normalization factor given by ${0.0024+0.0016+0.0576+0.0064}$.
Having completed the abduction step, let us now perform the intervention of interest, that is $do(S=0)$. This has several effects. First, it cuts out the effects of $U_S$, so that we can just ignore this exogenous variable now. Second, the structural equation in $T$ is virtually reduced to $0$, since the product of $U_T$ with the intervened $S$ is identically zero. Third, the structural equation in $C$ also virtually simplifies to $U_{C1}$ because $T$ is now always zero. Thus:
$$P_{\mathcal{M}_{S=1,C=1}}(C=0 \vert do(S=0)) = P(U_{C1}=0) = 1 - \frac{0.0024+0.0016+0.0064}{Z}.$$Numerically:
Z = (0.0576 + 0.0064 + 0.0024 + 0.0016)
P_Uc1 = (0.0024 + 0.0064 + 0.0016) / Z
cfP = 1 - P_Uc1
cfP
0.8470588235294118
Empirically:
cf_data_m0 = np.zeros((n_samples,3))
for i in tqdm(range(n_samples)):
cf_data_m0[i,:] = m0.counterfactual_sample()
100%|████████████████████████████████| 100000/100000 [00:08<00:00, 11958.49it/s]
print('Counterfactual distributions:')
print('cf_m0: P(C=0 | S=0) = {0}'.format(np.sum(cf_data_m0[:,2]==0)/n_samples))
Counterfactual distributions: cf_m0: P(C=0 | S=0) = 0.84476
The numerical and emprical results for the counterfactual probability on $\mathcal{M}$ agree.
Model for $\mathcal{M}_{\iota\bar{\iota}}$. We derived the counterfactual quantity of interest for $\mathcal{M}$ using two methods: (i) via enumeration of possible worlds, and (ii) via simulation.
We could indeed setup a counterfactual model $\mathcal{M}_{\iota\bar{\iota}}$, similarly to the way in which we instantiate an interventional model $\mathcal{M}_{\iota}$ after an intervention $\iota$. The counterfactual model $\mathcal{M}_{\iota\bar{\iota}}$ would have the following form:
Perfroming a pushfoward of the distributions, we can then derive our usual simplified model, which we can embed in $\mathtt{FinStoch}$:
Model $\mathcal{M'}$. Let us follow the same approch for the abstracted model. Let us evaluate all the possible worlds, which now depends on three exogenous variables $U_{S'},U_{C1'},U_{C2'}$ and two endogenous variables $S',C'$.
U_S' | U_C1' | U_C2' | S' | C' | P |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0.4752 |
0 | 0 | 1 | 0 | 0 | 0.2448 |
0 | 1 | 0 | 0 | 1 | 0.0528 |
0 | 1 | 1 | 0 | 1 | 0.0272 |
1 | 0 | 0 | 1 | 0 | 0.1188 |
1 | 0 | 1 | 1 | 1 | 0.0612 |
1 | 1 | 0 | 1 | 0 | 0.0132 |
1 | 1 | 1 | 1 | 1 | 0.0068 |
We now have 8 lines because of the possible combinations of the three exogenous variables $U_{S'}, U_{C1}, U_{C2}$. Endogenous variables $S',C'$ are deterministically computed. Thanks to independence, the probability of each world is simply the product of the probability of the outcome of each single exogenous variable.
Let us restrict our attention to those worlds where $S'=1,C'=1$. We have only two worlds:
U_S' | U_C1' | U_C2' | S' | C' | P |
---|---|---|---|---|---|
1 | 0 | 1 | 1 | 1 | 0.0612 |
1 | 1 | 1 | 1 | 1 | 0.0068 |
From this two lines of the table we can perform our abduction step for the estimation of the probability of the exogenous variables:
$$P(U_{S'} = 1) = 1$$$$P(U_{C1'} = 1) = \frac{0.0068}{Z}$$$$P(U_{C2'} = 1) = 1$$where $Z$ is just the normalization factor given by ${0.0068+0.0612}$.
After the abduction step, we perform our counterfactual intervention $do(S'=0)$. First of all, the effects of $U_{S'}$ are excluded, and this exogenous variable ends up being irrelevant. Once $S'$ is identically zero, then the structural equation of $C'$ is reduced to $U_{C1'}$, further excluding the effect of the other exogenous variable $U_{C2'}$. Consequently we have that:
$$P_{\mathcal{M'}_{S'=1,C'=1}}(C'=0 \vert do(S'=0)) = P(U_{C1'}=0) = 1 - \frac{0.0068}{Z}.$$Numerically:
Z_ = (0.0068+0.0612)
P_Uc1_ = (0.0068) / Z_
cfP_ = 1 - P_Uc1_
cfP_
0.9
Empirically:
cf_data_m1 = np.zeros((n_samples,2))
for i in tqdm(range(n_samples)):
cf_data_m1[i,:] = m1.counterfactual_sample()
100%|████████████████████████████████| 100000/100000 [00:06<00:00, 15033.75it/s]
print('Counterfactual distributions:')
print("cf_m1: P(C'=0 | S'=0) = {0}".format(np.sum(cf_data_m1[:,1]==0)/n_samples))
Counterfactual distributions: cf_m1: P(C'=0 | S'=0) = 0.90168
The numerical and emprical results for the counterfactual probability on $\mathcal{M'}$ agree.
Model for $\mathcal{M'}_{\iota'\bar{\iota'}}$. After having derived the counterfactual quantity of interest for $\mathcal{M'}$ in two ways, (i) via enumeration of possible worlds, and (ii) via simulation, we can setup the counterfactual model $\mathcal{M'}_{\iota'\bar{\iota'}}$:
Perfroming a pushfoward of the distributions, we can then derive our usual simplified model, which we can embed in $\mathtt{FinStoch}$:
Consistency diagram. We can now use the models we have derived, $\mathcal{M}_{\iota\bar{\iota}}$ and $\mathcal{M'}_{\iota'\bar{\iota'}}$, and the usual identity abstractions $\alpha_{S'}$ and $\alpha_{C'}$ to set up our commuting diagram:
which, clearly, does not commute. Therefore the two models are not counterfactually consistent with respect to the query we considered.
A few observation on this result: (i) as expected, observational and interventional consistency do not guarantee counterfactual consistency, as all these quantities are defined on different models; (ii) there exists a large number of counterfactual statements that we may want to consider, while we have examined a single and pretty straightforward counterfactual quantity in our simulation.
In this notebook we have formally reviewed different forms of consistency that may hold between a base and an abstracted model. We have analyzed consistencies according to the model on which they are evaluated (observational, interventional, counterfactual) and the distributions we were interested into (marginal, joint, conditional).
Our approach has been to always reduce our SCM model to a simple Bayes networks which we could embed in $\mathtt{FinStoch}$ and on which we could evaluate commutativity. This meant that interventions and counterfactual statements were used to generate the corresponding Bayes network from the given SCM. This has allowed us to easily work with very different (and possibly convoluted) interventional or counterfactual statements, while, at the same time, always evaluating consistency on simple diagrams. The ensuing simulations showed how SCMs related via abstractions may guarantee different forms of consistency.
[Rischel2021] Rischel, Eigil F., and Sebastian Weichwald. "Compositional Abstraction Error and a Category of Causal Models." arXiv preprint arXiv:2103.15758 (2021).
[Rischel2020] Rischel, Eigil Fjeldgren. "The Category Theory of Causal Models." (2020).
[Rubenstein2017] Rubenstein, Paul K., et al. "Causal consistency of structural equation models." arXiv preprint arXiv:1707.00819 (2017).
[Pearl2009] Pearl, Judea. Causality. Cambridge university press, 2009.
[Peters2017] Peters, Jonas, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
[Spivak2014] Spivak, David I. Category theory for the sciences. MIT Press, 2014.
[Fong2018] Fong, Brendan, and David I. Spivak. "Seven sketches in compositionality: An invitation to applied category theory." arXiv preprint arXiv:1803.05316 (2018).
[Bareinboim2022] Bareinboim, Elias, et al. "On pearl’s hierarchy and the foundations of causal inference." Probabilistic and Causal Inference: The Works of Judea Pearl. 2022. 507-556.
[Darwiche2022] Darwiche, Adnan. Counterfactuals.