Cheatsheet¶
You are not allowed to bring books or notes to the exam. Instead, feel free to make use of the following cheatsheet as we will provide this or a similar cheatsheet in an appendix of the exam papers.
Some Matrix Calculus, see also Bishop, appendix C.
$$\begin{align*}
|A^{-1}|&=|A|^{-1} \\
\nabla_A \log |A| &= (A^{T})^{-1} = (A^{-1})^T \\
\mathrm{Tr}[ABC]&= \mathrm{Tr}[CAB] = \mathrm{Tr}[BCA] \\
\nabla_A \mathrm{Tr}[AB] &=\nabla_A \mathrm{Tr}[BA]= B^T \\
\nabla_A \mathrm{Tr}[ABA^T] &= A(B+B^T)\\
\nabla_x x^TAx &= (A+A^T)x\\
\nabla_X a^TXb &= \nabla_X \mathrm{Tr}[ba^TX] = ab^T
\end{align*}$$
- Definition of the Multivariate Gaussian Distribution (MVG)
$$
\mathcal{N}(x|\,\mu,\Sigma) = |2 \pi \Sigma|^{-\frac{1}{2}} \exp\left\{-\frac{1}{2}(x-\mu)^T
\Sigma^{-1} (x-\mu) \right\}
$$
- A linear transformation $z=Ax+b$ of a Gaussian variable $\mathcal{N}(x|\mu,\Sigma)$ is Gaussian distributed as
$$
p(z) = \mathcal{N} \left(z \,|\, A\mu+b, A\Sigma A^T \right)
$$
- Multiplication of 2 Gaussian distributions
$$
\mathcal{N}(x|\mu_a,\Sigma_a) \cdot \mathcal{N}(x|\mu_b,\Sigma_b) = \alpha \cdot \mathcal{N}(x|\mu_c,\Sigma_c)
$$
with
$$\begin{align*}
\Sigma_c^{-1} &= \Sigma_a^{-1} + \Sigma_b^{-1} \\
\Sigma_c^{-1}\mu_c &= \Sigma_a^{-1}\mu_a + \Sigma_b^{-1}\mu_b \\
\alpha &= \mathcal{N}(\mu_a | \mu_b, \Sigma_a + \Sigma_b)
\end{align*}$$
- Conditioning and marginalization of Gaussians. Let $z = \begin{bmatrix} x \\ y \end{bmatrix}$ be jointly normal distributed as
$$\begin{align*}
p(z) &= \mathcal{N}(z | \mu, \Sigma)
=\mathcal{N} \left( \begin{bmatrix} x \\ y \end{bmatrix} \,\left|\, \begin{bmatrix} \mu_x \\ \mu_y \end{bmatrix},
\begin{bmatrix} \Sigma_x & \Sigma_{xy} \\ \Sigma_{yx} & \Sigma_y \end{bmatrix} \right. \right)\,,
\end{align*}$$
then $p(z) = p(y|x)\cdot p(x)$, with
$$\begin{align*}
p(y|x) &= \mathcal{N}\left(y\,|\,\mu_y + \Sigma_{yx}\Sigma_x^{-1}(x-\mu_x),\, \Sigma_y - \Sigma_{yx}\Sigma_x^{-1}\Sigma_{xy} \right) \\
p(x) &= \mathcal{N}\left( x\,|\,\mu_x, \Sigma_x \right)
\end{align*}$$
- For a binary variable $x \in \{0,1\}$, the Bernoulli distribution is given by
$$
p(x|\mu) = \mu^{x}(1-\mu)^{1-x}
$$
- The conjugate prior for $\mu$ is the Beta distribution, given by
$$
p(\mu) = \mathcal{B}(\mu|\alpha,\beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} \mu^{\alpha-1}(1-\mu)^{\beta-1}
$$
where $\alpha$ and $\beta$ are "hyperparameters" that you can set to reflect your prior beliefs about $\mu$.