Notebook

Models, response schedules, and estimators¶

This notebook summarizes some probability distributions and models related to them, and draws a distinction between a model and a response schedule.

Some common probability distributions¶

Discrete¶

Bernoulli: distribution of a single trial that can result in "success" (1) or "failure" (0). A random variable $X$ has the Bernoulli($p$) distribution iff

$$\Pr \{ X=1 \} = p, \;\; \Pr \{X=0\} = 1-p.$$

Binomial: distribution of the number of successes in $n$ independent Bernoulli($p$) trials. Special case: Bernoulli ($n=1$). A random variable $X$ has a Binomial($n,p$) distribution iff

$$\Pr \{X=j\} = {{n}\choose{j}}p^j(1-p)^{n-j}, \; j=0, 1, \ldots, n.$$

Geometric: distribution of the number of trials until the 1st success in independent Bernoulli($p$) trials. A random variable $X$ has a Geometric($p$) distribution iff

$$ \Pr \{X=j\} = (1-p)^{j-1}p, \;\; j=1, 2, \ldots .$$

Negative binomial: distribution of the number of trials until the $k$th success in independent Bernoulli($p$) trials. Special case: geometric ($k=1$). A random variable $X$ has a Negative Binomial distribution with parameters $p$ and $k$ distribution iff

$$\Pr \{X=j\} = {{j-1}\choose{k-1}}(1-p)^{j-k}p^k, \;\; j=k, k+1, \ldots .$$

Poisson: limit of Binomial as $n \rightarrow \infty$ and $p \rightarrow 0$, with $np= \lambda$. A random variable $X$ has a Poisson($\lambda$) distribution iff

$$ \Pr \{X=j\} = e^{-\lambda} \frac{\lambda^j}{j!}, \;\;j=0, 1, \ldots .$$

Hypergeometric: number of "good" items in a simple random sample of size $n$ from a population of $N$ items of which $G$ are good. A random variable $X$ has a hypergeometric distribution with parameters $N$, $G$, and $n$ iff

$$ \Pr \{X = j,\; j = 1, \ldots, k\} = \frac{{{G}\choose{j}}{{N-G}\choose{n-j}}}{{N}\choose{n}}, \;\; j = \max(0,n-(N-G)), \ldots, \min(n, G).$$

Multinomial: joint distribution of the number of values in each of $k \ge 2$ categories

for $n$ IID draws with probability $\pi_j$ of selecting value $j$ in each draw. Special cases: uniform distribution on $k$ outcomes ($n=1$, $\pi_j = 1/k$), binomial ($k=2$). A random vector $(X_1, \ldots, X_k)$ has a multinomial joint distribution with parameters $n$ and $\{\pi_j\}_{j=1}^k$ iff $$ \Pr \{X_j = x_j \} = \prod_{j=1}^k \pi_j^{x_j} \frac{n!}{x_1!x_2! \cdots x_j!}, \;\; x_j \ge 0,\;\; \sum_{j=1}^k x_j = n.$$

Multi-hypergeometric: joint distribution of the number of values in each of $k \ge 2$ categories for $n$ draws without replacement from a finite population of $N$ items of

which $N_j$ are in category $j$. Special case: hypergeometric ($k = 2$). A random vector $(X_1, \ldots, X_k)$ has a multi-hypergeometric joint distribution with parameters $\{N_j\}_{j=1}^k$ iff $$ \Pr \{ X_j = x_j,\; j = 1, \ldots, k \} = \frac{{{N_1}\choose{x_1}} \cdots {{N_k}\choose{x_k}}}{{{N}\choose{n}}}, \;\; x_j \ge 0;\;\; \sum_j x_j = n; \;\; \sum_j N_j = N.$$

Continuous¶

Uniform on a domain $\mathbf{S}$. A random variable $X$ has a Uniform distribution on $\mathbf{S}$ iff

$$ \Pr \{X \in A\} = \frac{\int_{A \cap S} dx}{\int_{S} dx}.$$

(Here and below, $A$ needs to be a Lebesgue-measurable set; we will not worry about measurability.)

Normal. A random variable $X$ has a normal distribution with mean $\mu$ and variance $\sigma^2$ iff

$$ \Pr \{ X \in A \} = \int_A \frac{1}{\sqrt{2 \pi} \sigma} e^{-(x-\mu)^2/(2\sigma^2)} dx.$$

Distributions derived from the normal: Student's t, F, chi-square
Exponential. A random variable $X$ has an exponential distribution with rate $\lambda$

(mean $\lambda^{-1}$) iff $$ \Pr \{ X \in A \} = \int_{A \cap [0, \infty)} \lambda e^{-\lambda x} dx.$$

Gamma. A random variable $X$ has a Gamma distribution with shape parameter $\alpha$

and rate parameter $\beta$ iff $$ \Pr \{ X \in A \} = \int_{A \cap [0, \infty)}\frac{\beta ^{\alpha }}{\Gamma (\alpha )}x^{\alpha -1}e^{-\beta x} dx.$$

What's a model?¶

An expression for the probability distribution of data $X$, usually "indexed" by a (possibly abstract, possibly infinite-dimensional) parameter, often relating some observables (independent variables, covariates, explanatory variables, predictors) to others (dependent variables, response variables, data, ...).

$$ X \sim \mathbb{P}_\theta, \;\; \theta \in \Theta. $$

Examples¶

coins and 0-1 boxes
- number of heads in 1 toss
- number of heads in $n$ tosses
- number of tosses to the first head
- number of tosses to the $k$th head
draws without replacement
- boxes of numbers
- boxes of categories
radioactive decay
Hooke's Law, Ohm's Law, Boyle's Law
Conjoint analysis
avian-turbine interactions

Some models¶

Linear regression
Linear probability model
Logit
Probit
Multinomial logit
Poisson regression

Response schedules and causal inference¶

A response schedule is an assertion about how Nature generated the data: it says how one variable would respond if you intervened and changed the value of other variables.

Regression is about conditional expectation: the expected value of the response variable for cases selected on the basis of the values of the predictor variables.

Causal inference is about intervention: what would happen if the values of the predictor variables were exogenously set to some values.

Response schedules connect selection to intervention.

For conditioning to give the same result as intervention, the model has to be a response schedule, and the response schedule has to be correct.

Linear: a model for real-valued outcomes. $Y_X = X\beta + \epsilon$. Nature picks $X$, multiplies by $\beta$, adds $\epsilon$. $X$ and $\epsilon$ are independent.
- Good examples (for suitable ranges of $X$ and suitable instrumental error): Hooke's law, Ohm's law, Boyle's law
- Bad examples: most (if not all) applications in social science, including econometrics.
Linear probability model: a model for binary outcomes. $\Pr \{Y_j = 1 | X \} = X_j\beta + \epsilon$, where the components of $\epsilon$ are IID with mean zero. Not guaranteed to give probabilities between 0 and 1 when fitted to data.
Logit: a model for binary outcomes. Logistic distribution function is $\Lambda(x) = e^x/(1+e^x)$. The logit function is $\mathrm{logit} p \equiv \log [p/(1-p)]$, also called the log odds ratio. The logit model is that $\{Y_j\}$ are independent with $\Pr \{Y_j = 1 | X \} = \Lambda(X_j \beta)$. Equivalently, $\mathrm{logit} \Pr(Y_j=1 | X) = X_j \beta$. Also equivalently, the latent variable formulation

$$ Y_j = \begin{cases} 1, & X_j\beta + U_j \ge 0\\ 0, & \mathrm{otherwise,} \end{cases}$$

where $\{U_j \}$ are IID random variables with the logistic distribution, and are independent of $X$.

Probit: a model for binary outcomes. Let $\Phi$ denote the standard normal cdf. The probit model is that $\{Y_j\}$ are independent with $\Pr \{Y_j = 1 | X) = \Phi(X_j \beta)$. Equivalently, the latent variable formulation

$$ Y_j = \begin{cases} 1, & X_j\beta + U_j \ge 0\\ 0, &\mathrm{otherwise,} \end{cases}$$

where $\{U_j \}$ are IID random variables with the standard normal distribution, and are independent of $X$.

Multinomial logit: a model for categorical outcomes. Suppose there are $K$ categories.

The multinomial logit model is that $\{Y_j\}$ are independent with $$ \Pr \{Y_j = k | X \} = \begin{cases} \frac{e^{X_j \beta_k}}{1 + \sum_{\ell=1}^{K-1}e^{X_j \beta_\ell}}, & k=1, \ldots, K-1 \\ \frac{1}{1 + \sum_{\ell=1}^{K-1}e^{X_j \beta_\ell}}, & k=K. \end{cases} $$

Poisson regression: a model for non-negative counts. The model is that $\{Y_j\}$ are independent Poisson random variables with corresponding rates $\{\lambda_j\}$ and that

$$ \log \lambda_j | X = X_j \beta.$$

In [ ]: