In order to prove that the EM algorithm works, we will need Gibbs inequality, which is a famous theorem in information theory.
Definition: the Kullback-Leibler divergence (a.k.a. relative entropy) is a distance measure between two distributions $q$ and $p$ and is defined as
with equality only iff $p=q$.
and consequenty, maximizing $\mathrm{LB}$ over $q$ leads to minimization of the KL-divergence. Specifically, it follows from Gibbs inequality that $$ q^{(m+1)}(z) := p(z|x,\theta^{(m)}) $$
Maximize $\mathrm{E}_Z\left[ \log p(D_c|\theta) \right]$ w.r.t. $\theta=\{\pi,\mu,\Sigma\}$
Verify that your solution is the same as the 'intuitive' solution of the previous lesson.
Scenario. Toss coin $0$. If Head comes up, toss three times with coin $1$; otherwise, toss three times with coin $2$.
The observed sequences after each toss with coin $0$ were $\langle \mathrm{HHH}\rangle$, $\langle \mathrm{HTH}\rangle$, $\langle \mathrm{HHT}\rangle$, and $\langle\mathrm{HTT}\rangle$
Task. Use EM to estimate most the likely values for $\lambda$, $\rho$ and $\theta$
conditions! EM converges to a local optimum.
The cell below loads the style file
open("../../styles/aipstyle.html") do f
display("text/html", readstring(f))
end