import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d.art3d as art3d
import numpy as np
In this chapter, we will only be dealing with joint distributions, which are the most important section of the whole course. Joint distributions are used for formulating all kinds of probability model.
The joint probability mass function of two discrete random variable is defined as
$$ P_{XY}(x, y) = P(X = x, Y=y) $$It is convenient to define a finite range for $X$ and $Y$, $R_X = \{x_1, x_2, ...\}$ and $R_Y = \{y_1, y_2, ...\}$ and its cartesian product
$$ R_{XY}\subset R_X \times R_Y = \{(x_i, y_j)|x_i\in R_X, y_j \in R_Y\} $$is the range for joint distribution.
The most common property for probability distribution is
$$ \sum_{(x_i,y_j)\in R_{XY}}P_{XY}(x_i,y_j)=1 $$Let's consider a probability mass function table.
from fractions import Fraction as frac
pY_0 = frac(1,6) + frac(1,8)
pY_1 = frac(1,4) + frac(1,6)
pY_2 = frac(1,8) + frac(1,6)
pX_0 = frac(1,6) + frac(1,4) + frac(1,8)
pX_1 = frac(1,8) + frac(1,6) + frac(1,6)
print('Marginal PMF of pY are {0}, {1}, {2}.'.format(pY_0,pY_1,pY_2))
print('Marginal PMF of pX are {0}, {1}.'.format(pX_0,pX_1))
Marginal PMF of pY are 7/24, 5/12, 7/24. Marginal PMF of pX are 13/24, 11/24.
The reason we call them marginal is because they are written at the margin of the table.
\begin{array}{|c|c|} \hline & Y = 0 & Y = 1 & Y= 2 & P_X(x)\\ \hline X = 0 & 1/6 & 1/4 & 1/8 & 13/24 \\ \hline X = 1 & 1/8 & 1/6 & 1/6 & 11/24\\ \hline P_Y(y) & 7/24 & 5/12 & 7/24 & \\ \hline \end{array}If independent, a conditional probability should equal to marginal probability, for instance
$$ P(X=0| Y = 1)= \frac{1/4}{1/4+1/6} =3/5\\ P_X(X=0)=13/24 $$They are not equal, which means they are not independent.
The relationship of marginal PMF and conditional PMF is
$$ P(X|Y) = \frac{P(X,Y)}{P_Y(Y)} $$i.e.
$$ \text{Conditional PMF} = \frac{\text{Joint PMF}}{\text{Marginal PMF}} $$The joint CDF of two random variables $X$ and $Y$ is defined as
$$ F_{XY}(x,y)=P(X\leq x, Y\leq y) $$where $0\leq F_{XY}(x,y) \leq 1$.
For instance, the joint CDF of $P(X\leq 2, Y\leq 1)$ in range $(-6,\ 6)$ is the probability of the shaded area.
x = np.linspace(-6, 1)
y = 2*np.ones(len(x))
fig, ax = plt.subplots(figsize = (8, 8))
ax.plot([1, -5], [2, 2], color = 'b')
ax.scatter(1, 2, s = 80, zorder = 3, color = 'red')
ax.plot([1, 1], [2, -5], color = 'b')
ax.axis([-5, 6, -5, 6])
ax.scatter(np.random.uniform(low = -5, high = 6, size = 50),
np.random.uniform(low = -5, high = 6, size = 50))
ax.fill_between(x, y, -5, color = 'red', alpha =.2)
ax.text(1, 2.1, '$(1, 2)$', size = 15)
ax.grid()
Marginal CDF $F_X(x)$ and $F_Y(y)$ are denoted
$$ F_X(x) = P(X\leq x, Y\leq \infty)\\ F_Y(y) = P(X\leq \infty, Y\leq y) $$If $A$ is a random event, the conditional PMF of $X$ given $A$ is denoted as
$$ P_{X|A}(X = x_i) = \frac{P(X=x_i,A)}{P(A)} $$Consider a PMF as below
\begin{array}{|c|c|} \hline & X = -2 & X = -1 & X = 0 & X = 1 & X = 2 \\ \hline Y = 2 & 0 & 0 & 1/13 & 0 & 0 \\ \hline Y = 1 & 0 & 1/13 & 1/13 & 1/13 & 0 \\ \hline Y = 0 & 1/13 & 1/13 & 1/13 & 1/13 & 1/13 \\ \hline Y = -1 & 0 & 1/13 & 1/13 & 1/13 & 0 \\ \hline Y = -2 & 0 & 0 & 1/13 & 0 & 0 \\ \hline \end{array}Mathematically, it is defined as $G=\{(x, y)|x, y \in \mathbb{Z},| x|+| y | \leq 2\}$.
pY_2 = frac(1,13)
pY_1 = frac(1,13)*3
pY_0 = frac(1,13)*5
pY_m1 = frac(1,13)*3
pY_m2 = frac(1,13)
pX_2 = frac(1,13)
pX_1 = frac(1,13)*3
pX_0 = frac(1,13)*5
pX_m1 = frac(1,13)*3
pX_m2 = frac(1,13)
print('Marginal PMF of pY are {0}, {1}, {2}, {3}, {4}.'.format(pY_2,pY_1,pY_0,pY_m1,pY_m2))
print('Marginal PMF of pX are {0}, {1}, {2}, {3}, {4}.'.format(pX_2,pX_1,pX_0,pX_m1,pX_m2))
Marginal PMF of pY are 1/13, 3/13, 5/13, 3/13, 1/13. Marginal PMF of pX are 1/13, 3/13, 5/13, 3/13, 1/13.
We add marginals to the table
\begin{array}{|c|c|} \hline & X = -2 & X = -1 & X = 0 & X = 1 & X = 2 & P_Y(y) \\ \hline Y = 2 & 0 & 0 & 1/13 & 0 & 0 & 1/13\\ \hline Y = 1 & 0 & 1/13 & 1/13 & 1/13 & 0 & 3/13 \\ \hline Y = 0 & 1/13 & 1/13 & 1/13 & 1/13 & 1/13 & 5/13 \\ \hline Y = -1 & 0 & 1/13 & 1/13 & 1/13 & 0 & 3/13 \\ \hline Y = -2 & 0 & 0 & 1/13 & 0 & 0 & 1/13\\ \hline P_X(x) &1/13 &3/13 & 5/13 & 3/13 & 1/13 \\ \hline \end{array}It shows that given $Y=1$, $X$ is uniformly distributed over $\{-1,0,1\}$.
No, for instance $P(X=0|Y=1) \neq P_X(X = 0)$
If random event $A$ is replaced by a discrete random variable $Y$, the conditional density PMFs are defined as
$$ \begin{array}{l} P_{X | Y}\left(x_{i} | y_{j}\right)=\frac{P_{X Y}\left(x_{i}, y_{j}\right)}{P_{Y}\left(y_{j}\right)} \\ P_{Y | X}\left(y_{j} | x_{i}\right)=\frac{P_{X Y}\left(x_{i}, y_{j}\right)}{P_{X}\left(x_{i}\right)} \end{array} $$where $x_i$ and $y_j$ are realizations of $X$ and $Y$.
The expectation can be conditional on a random event or a realization of random variable.
$$\begin{align} E[X | A]&=\sum_{x_{i}\in R_{X}}x_{i} P_{X | A}\left(x_{i}|A\right) \\ E[X | Y=y_{j}]&=\sum_{x_{i} \in R_{X}} x_{i} P_{X | Y}\left(x_{i} | Y=y_{j}\right) \end{align}$$Use the PMF example in last section, let's try to answer questions below.
To calculate the conditional expectation, we must use conditional probability as weight:
First, calculate the conditional PMF
$$ P_{X|-1<Y<2}(x_i |-1<Y<2) = -2\frac{1/13}{8/13}-\frac{2/13}{8/13}+0\frac{3/13}{8/13}+ \frac{2/13}{8/13} + 2\frac{1/13}{8/13}=0 $$If you paid attention to the conditional expection expression
$$ E[X | Y=y_{j}]=\sum_{x_{i} \in R_{X}} x_{i} P_{X | Y}\left(x_{i} | Y=y_{j}\right) $$you would find that it is actually a function of $Y$.
Consider a joint PMF below
$$ \begin{array}{|c|c|} \hline & X = 0 & X = 1 & P_Y(y) \\ \hline Y = 0 & 1/5 & 2/5 & 3/5\\ \hline Y = 1 & 2/5 & 0 & 2/5 \\ \hline P_X(x) &3/5 & 2/5 \\ \hline \end{array} $$Remember that $Z$ is a function of $Y$. To calculate conditional expectation, we need to use conditional probability as well.
$$ E[X|Y = 0] = 0 \left(\frac{\frac{1}{5}}{\frac{1}{5}+\frac{2}{5}}\right)+1\left(\frac{\frac{2}{5}}{\frac{1}{5}+\frac{2}{5}}\right) =\frac{2}{3}\\ E[X|Y = 1] = 0 $$Because $E[X|Y]$ itself is a variable, it must have an expectation as well
$$ E[Z] = E[E[X|Y]] = P_Y(Y = 0)E[X|Y = 0]+ P_Y(Y = 1)E[X|Y = 1] = \frac{3}{5}\cdot\frac{2}{3}+\frac{2}{5}\cdot0=\frac{2}{5} $$Actually, $E[Z] = E[E[X|Y]] = E[X]$ must hold, it is the law of iterated expectation.
All the rules of expectation for independent variables are here, they are fairly straightforward, because conditioning on $Y$ does not provide any extra information
Joint PDF of $X$ and $Y$ is defined as
$$ P((X, Y) \in A)=\iint_{A} f_{X Y}(x, y) d x d y =1 $$where $f_{XY}(x, y)$ is a non-negative function, mapping $\mathbb{R}^2$ to $\mathbb{R}$.
However, we are particularly interested in the case that $A$ is a rectangular,
$$ P(a\geq X \geq b, c\geq Y \geq d) =\int_c^d\int_a^b f_{X Y}(x, y) d x d y $$And within $A$, there are infinite amount of small rectangles
$$ P(a\geq X \geq a+\delta, c\geq Y \geq c+\delta )\approx f_{XY}(a,c)\delta^2 $$Let's consider an example other than normal distribution.
$$ f_{X Y}(x, y)=\left\{\begin{array}{ll} x+c y^{2} & 0 \leq x \leq 1,\quad 0 \leq y \leq 1 \\ 0 & \text { otherwise } \end{array}\right. $$Use the property $\iint_{A} f_{X Y}(x, y) d x d y =1$
\begin{align} \int^1_0\int^1_0(x+cy^2)dxdy &= 1\\ \int^1_0\left[\frac{x^2}{2}+cxy^2\right]^1_0dy &= 1\\ \int^1_0\left[\frac{1}{2}+cy^2\right]dy &= 1\\ \left[\frac{y}{2}+c\frac{y^3}{3}\right]^1_0&=1\\ \frac{1}{2}+\frac{c}{3}&=1\\ c&=\frac{3}{2}\\ \end{align}Plug in $c$, perform double integration
\begin{align} \int^{1/2}_{0}\int^{1/2}_0\left(x+\frac{3}{2}y^2\right)dxdy &= \int_0^{1/2}\left[\frac{x^2}{2}+\frac{3}{2}y^2x\right]_0^{1/2}dy \\ &=\int_0^{1/2}\left[\frac{1}{8}+\frac{3}{4}y^2\right]dy\\ &=\left[\frac{1}{8}+\frac{y^3}{4}\right]_0^{1/2}\\ &=\frac{3}{32} \end{align}The joint distribution is depicted as below, the volume between the curved plane and $xy$ plane is $1$.
x, y = np.linspace(0, 1), np.linspace(0, 1)
X, Y = np.meshgrid(x, y)
Z = X + 3/2*Y**2
fig = plt.figure(figsize = (8, 8))
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, cmap = 'coolwarm')
ax.contourf(X, Y, Z, zdir='z', offset=0, cmap='coolwarm')
plt.show()
Maringal PDF of $X$ and $Y$ are
\begin{equation} f_{X}(x)=\int_{-\infty}^{\infty} f_{X Y}(x, y) d y,\quad \text { for all } x \\ f_{Y}(y)=\int_{-\infty}^{\infty} f_{X Y}(x, y) d x,\quad \text { for all } y \end{equation}Let's use the same example as in last section to find out $f_X(x)$ and $f_Y(y)$.
$$ f_{X}(x)=\int_{0}^{1}\left(x+\frac{3}{2}y^2\right) d y =x+\frac{1}{2}\\ f_{Y}(y)=\int_{0}^{1}\left(x+\frac{3}{2}y^2\right) d x =\frac{3}{2} y^{2}+\frac{1}{2} $$Joint CDF and joint PDF has relationship as follows:
$$ F_{X Y}(x, y)=\int_{-\infty}^{y} \int_{-\infty}^{x} f_{X Y}(u, v) d u d v \\ f_{X Y}(x, y)=\frac{\partial^{2}}{\partial x \partial y} F_{X Y}(x, y) $$The same PDF as above, find the CDF.
$$ f_{X Y}(x, y)=\left\{\begin{array}{ll} x+\frac{3}{2} y^{2} & 0 \leq x \leq 1,\quad 0 \leq y \leq 1 \\ 0 & \text { otherwise } \end{array}\right. $$Consider the conditional PDF of $X$ given that $X\in A$
\begin{align} P(x\leq X \leq x+\delta|X \in A)\approx f_{X|X\in A}(x)\cdot \delta &= \frac{P(x\leq X \leq x+\delta,X \in A)}{P(A)}\\ &=\frac{P(x\leq X \leq x+\delta)}{P(A)}\\ &\approx\frac{f_X(x)\delta}{P(A)} \end{align}We have shown that
$$ f_{X|X\in A}(x) = \frac{f_X(x)}{P(A)} $$You can imagin $P(A)$ as a scaling factor that normalize the conditional PDF into an area of $1$.
For two jointly continuous random variables $X$ and $Y$, we can define the following conditional concepts:
The intuition of the first expression, i.e. conditional PDF is
$$ P(x\leq X \leq x+\delta| y\leq Y\leq y+\epsilon)\approx \frac{f_{XY}(xy)\delta\epsilon}{f_Y(y)\epsilon}=f_{X|Y}(x|y)\delta $$Conditional probability must satisfy the basic rule of probability as well,
$$ \int_{-\infty}^\infty f_{X|Y}(x|y)dx = 1 $$because
$$ \frac{\int_{-\infty}^\infty f_{XY}(xy)dx}{f_Y(y)} = 1 $$Rearrange the conditional PDF, we obtain the multiplication rule
$$ f_{XY}(xy)=f_{X|Y}(x|y)f_Y(y) $$If continuous variables $X$ and $Y$ are independent, then knowing either of them does not provide information for the other. That is
$$ f_{X|Y}(x|y) = f_X(x),\qquad \text{or} \qquad f_{Y|X}(y|x) = f_Y(y) $$Thus the multiplication rule for independent distribution
$$ f_{XY}(xy)=f_X(x)f_Y(y) $$Other rules derived from this are
\begin{align} E[XY]&= E[X]E[Y]\\ \text{Var}(X+Y)&=\text{Var}(X)+\text{Var}(Y)\\ E[g(X)h(Y)]&=E[g(X)]E[h(Y)] \end{align}