This notebook summarizes some probability distributions and models related to them, and draws a distinction between a model and a response schedule.
for $n$ IID draws with probability $\pi_j$ of selecting value $j$ in each draw. Special cases: uniform distribution on $k$ outcomes ($n=1$, $\pi_j = 1/k$), binomial ($k=2$). A random vector $(X_1, \ldots, X_k)$ has a multinomial joint distribution with parameters $n$ and $\{\pi_j\}_{j=1}^k$ iff $$ \Pr \{X_j = x_j \} = \prod_{j=1}^k \pi_j^{x_j} \frac{n!}{x_1!x_2! \cdots x_j!}, \;\; x_j \ge 0,\;\; \sum_{j=1}^k x_j = n.$$
which $N_j$ are in category $j$. Special case: hypergeometric ($k = 2$). A random vector $(X_1, \ldots, X_k)$ has a multi-hypergeometric joint distribution with parameters $\{N_j\}_{j=1}^k$ iff $$ \Pr \{ X_j = x_j,\; j = 1, \ldots, k \} = \frac{{{N_1}\choose{x_1}} \cdots {{N_k}\choose{x_k}}}{{{N}\choose{n}}}, \;\; x_j \ge 0;\;\; \sum_j x_j = n; \;\; \sum_j N_j = N.$$
(Here and below, $A$ needs to be a Lebesgue-measurable set; we will not worry about measurability.)
Distributions derived from the normal: Student's t, F, chi-square
Exponential. A random variable $X$ has an exponential distribution with rate $\lambda$
(mean $\lambda^{-1}$) iff $$ \Pr \{ X \in A \} = \int_{A \cap [0, \infty)} \lambda e^{-\lambda x} dx.$$
and rate parameter $\beta$ iff $$ \Pr \{ X \in A \} = \int_{A \cap [0, \infty)}\frac{\beta ^{\alpha }}{\Gamma (\alpha )}x^{\alpha -1}e^{-\beta x} dx.$$
An expression for the probability distribution of data $X$, usually "indexed" by a (possibly abstract, possibly infinite-dimensional) parameter, often relating some observables (independent variables, covariates, explanatory variables, predictors) to others (dependent variables, response variables, data, ...).
$$ X \sim \mathbb{P}_\theta, \;\; \theta \in \Theta. $$coins and 0-1 boxes
draws without replacement
radioactive decay
Hooke's Law, Ohm's Law, Boyle's Law
Conjoint analysis
avian-turbine interactions
Linear regression
Linear probability model
Logit
Probit
Multinomial logit
Poisson regression
A response schedule is an assertion about how Nature generated the data: it says how one variable would respond if you intervened and changed the value of other variables.
Regression is about conditional expectation: the expected value of the response variable for cases selected on the basis of the values of the predictor variables.
Causal inference is about intervention: what would happen if the values of the predictor variables were exogenously set to some values.
Response schedules connect selection to intervention.
For conditioning to give the same result as intervention, the model has to be a response schedule, and the response schedule has to be correct.
Linear: a model for real-valued outcomes. $Y_X = X\beta + \epsilon$. Nature picks $X$, multiplies by $\beta$, adds $\epsilon$. $X$ and $\epsilon$ are independent.
Linear probability model: a model for binary outcomes. $\Pr \{Y_j = 1 | X \} = X_j\beta + \epsilon$, where the components of $\epsilon$ are IID with mean zero. Not guaranteed to give probabilities between 0 and 1 when fitted to data.
Logit: a model for binary outcomes. Logistic distribution function is $\Lambda(x) = e^x/(1+e^x)$. The logit function is $\mathrm{logit} p \equiv \log [p/(1-p)]$, also called the log odds ratio. The logit model is that $\{Y_j\}$ are independent with $\Pr \{Y_j = 1 | X \} = \Lambda(X_j \beta)$. Equivalently, $\mathrm{logit} \Pr(Y_j=1 | X) = X_j \beta$. Also equivalently, the latent variable formulation
where $\{U_j \}$ are IID random variables with the logistic distribution, and are independent of $X$.
where $\{U_j \}$ are IID random variables with the standard normal distribution, and are independent of $X$.
The multinomial logit model is that $\{Y_j\}$ are independent with $$ \Pr \{Y_j = k | X \} = \begin{cases} \frac{e^{X_j \beta_k}}{1 + \sum_{\ell=1}^{K-1}e^{X_j \beta_\ell}}, & k=1, \ldots, K-1 \\ \frac{1}{1 + \sum_{\ell=1}^{K-1}e^{X_j \beta_\ell}}, & k=K. \end{cases} $$