不完全数据:观测随机变量$Y$。
完全数据:观测随机变量$Y$和隐随机变量$Z$。
$Q$函数:完全数据的对数似然函数$\log P \left( Y , Z | \theta \right)$关于在给定观测数据$Y$和当前参数$\theta_{\left( i \right)}$下对未观测数据$Z$的条件概率分布$P \left( Z | Y, \theta_{\left( i \right)} \right)$的期望
\begin{align*} & Q \left( \theta, \theta_{\left( i \right)} \right) = E_{Z} \left[ \log P \left( Y, Z | \theta \right) | Y , \theta_{\left( i \right)} \right] \end{align*}
含有隐变量$Z$的概率模型,目标是极大化观测变量$Y$关于参数$\theta$的对数似然函数,即 \begin{align*} & \max L \left( \theta \right) = \log P \left( Y | \theta \right) \\ & = \log \sum_{Z} P \left( Y,Z | \theta \right) \\ & = \log \left( \sum_{Z} P \left( Y|Z,\theta \right) P \left( Z| \theta \right) \right)\end{align*}
对数似然函数$L \left( \theta \right)$与第$i$次迭代后的对数似然函数估计值$L \left( \theta^{\left( i \right)} \right)$的差 \begin{align*} & L \left( \theta \right) - L \left( \theta^{\left( i \right)} \right) = \log \left( \sum_{Z} P \left( Y|Z,\theta \right) P \left( Z| \theta \right) \right) - \log P \left( Y| \theta^{ \left( i \right)} \right) \\ & = \log \left( \sum_{Z} P \left( Z | Y , \theta^{\left( i \right)} \right) \dfrac { P \left( Y|Z,\theta \right) P \left( Z| \theta \right)} {P \left( Z | Y , \theta^{\left( i \right)} \right)} \right) - \log P \left( Y| \theta^{ \left( i \right)} \right)\\ &\geq \sum_{Z} P \left( Z | Y , \theta^{\left( i \right)} \right) \log \dfrac {P \left( Y | Z, \theta \right) P \left(Z|\theta\right)}{P \left( Z | Y , \theta^{\left( i \right)} \right)} - \log P \left( Y| \theta^{ \left( i \right)} \right) \\ & = \sum_{Z} P \left( Z | Y , \theta^{\left( i \right)} \right) \log \dfrac {P \left( Y | Z, \theta \right) P \left(Z|\theta\right)} {P \left( Z | Y , \theta^{\left( i \right)} \right) P \left(Y|\theta^{\left( i \right)} \right)}\end{align*}
令\begin{align*} \\& B \left( \theta , \theta^{\left ( i \right)} \right) = L \left( \theta^{\left ( i \right)} \right) + \sum_{Z} P \left( Z | Y , \theta^{\left( i \right)} \right) \log \dfrac {P \left( Y | Z, \theta \right) P \left(Z|\theta\right)} {P \left( Z | Y , \theta^{\left( i \right)} \right) P \left(Y|\theta^{\left( i \right)} \right)} \end{align*}
则 \begin{align*} & L \left( \theta \right) \geq B \left( \theta, \theta^{\left( i \right)} \right) \end{align*}
即函$B \left( \theta, \theta^{\left( i \right)} \right)$ 是$L \left( \theta \right)$ 的一个下界。
选择$\theta^{\left( i \right)}$使$B \left( \theta, \theta^{\left( i \right)} \right) $达到极大,即 \begin{align*} & \theta^{\left( i+1 \right)}= \arg \max B \left( \theta, \theta^{\left( i \right)} \right) \\ & = \arg \max \left( L \left( \theta^{\left ( i \right)} \right) + \sum_{Z} P \left( Z | Y , \theta^{\left( i \right)} \right) \log \dfrac {P \left( Y | Z, \theta \right) P \left(Z|\theta\right)} {P \left( Z | Y , \theta^{\left( i \right)} \right) P \left(Y|\theta^{\left( i \right)} \right)} \right) \\ & = \arg \max \left( \sum_{Z} P \left( Z | Y, \theta^{\left( i \right)} \right) \log \left( P \left( Y | Z, \theta \right) \right) P \left( Z | \theta \right) \right) \\ & = \arg \max \left( \sum_{Z} P \left( Z | Y, \theta^{\left( i \right)} \right) \log P \left( Y, Z | \theta\right) \right) \\ & = \arg \max Q \left( \theta, \theta^{\left( i \right)} \right) \end{align*}
EM算法:
输入:观测随机变量数据$Y$,隐随机变量数据$Z$,联合分布$P\left(Y,Z|\theta\right) $,条件分布$P\left(Y|Z,\theta\right) $;
输出:模型参数$\theta$
$F$函数:隐变量$Z$的概率分布为$\tilde{P} \left( Z \right)$,关于分布$\tilde{P}$与参数$\theta$的函数
\begin{align*} \\ & F \left( \tilde{P}, \theta \right) = E_{\tilde{P}} \left[ \log P \left( Y, Z | \theta \right)\right] + H \left( \tilde{P} \right) \end{align*}
其中,$H \left( \tilde{P} \right) = - E_{\tilde{P}} \left[ \log \tilde{P} \left( Z\right)\right]$是分布$\tilde{P} \left( Z \right)$的熵。
对于固定的$\theta$,极大化$F$函数
\begin{align*} \\ & \max_{\tilde{P}} F \left( \tilde{P}, \theta \right)
\\ & s.t. \sum_{Z} \tilde{P}_{\theta} \left( Z \right) = 1 \end{align*}
引入拉格朗日乘子$\lambda$,构造拉格朗日函数 \begin{align*} \\ & L = E_{\tilde{P}} \left[ \log P \left( Y, Z | \theta \right)\right] - E_{\tilde{P}} \left[ \log \tilde{P} \left( Z\right)\right] + \lambda \left( 1 - \sum_{Z} \tilde{P} \left( Z \right) \right) \\ & = \sum_{Z} \log P \left( Y, Z | \theta \right) \tilde{P} \left( Z \right) - \sum_{Z} \log P \left( Z \right) \tilde{P} \left( Z \right) + \lambda - \lambda \sum_{Z} \tilde{P} \left( Z \right) \end{align*}
将其对$\tilde{P} \left( Z \right)$求偏导,得 \begin{align*} \\ & \dfrac {\partial L}{\partial \tilde{P} \left( Z \right) } = \log P \left( Y, Z | \theta \right) - 1 - \log P \left( Z \right) - \lambda \end{align*}
令其等于0,得
\begin{align*} \\ & \lambda = \log P \left( Y, Z | \theta \right) - 1 - \log P \left( Z \right)
\\ & \dfrac{P \left( Y, Z | \theta \right) }{\tilde{P}_{\theta} \left( Z \right) } = e^{1 + \lambda }
\\ & \sum_{Z} P \left( Y, Z | \theta \right) = e^{1 + \lambda } \sum_{Z} \tilde{P}_{\theta} \left( Z \right) \end{align*}
由于$\sum_{Z} \tilde{P}_{\theta} \left( Z \right) = 1$,得
\begin{align*} \\ & P \left( Y \right) = e^{1 + \lambda } \end{align*}
代回,得
\begin{align*} \\ & \tilde{P}_{\theta} \left( Z \right) = P \left( Z | Y, \theta \right) \end{align*}
则 \begin{align*} \\ & F \left( \tilde{P}, \theta \right) = E_{\tilde{P}} \left[ \log P \left( Y, Z | \theta \right)\right] + H \left( \tilde{P} \right) \\ & = \sum_{Z} \log P \left( Y, Z | \theta \right) \tilde{P} \left( Z \right) - \sum_{Z} \log P \left( Z \right) \tilde{P} \left( Z \right) \\ & = \sum_{Z} \tilde{P} \left( Z \right) \log \dfrac{P \left( Y, Z | \theta \right) }{\tilde{P} \left( Z \right) } \\ & = \sum_{Z} \tilde{P} \left( Z \right) \log \dfrac{P \left( Z | Y, \theta \right) P \left(Y | \theta \right) }{\tilde{P} \left( Z \right) } \\ & = \log P \left(Y | \theta \right) \sum_{Z} \tilde{P} \left( Z \right) \\ & = \log P \left(Y | \theta \right) \\ & = L \left( \theta \right) \end{align*}
对于使$F \left( \tilde{P}, \theta \right)$达到最大值的参数$\theta^{*}$,有
\begin{align*} L \left( \theta^{*} \right) = F \left( \tilde{P}_{\theta^{*}}, \theta^{*} \right) = F \left( \tilde{P}^{*}, \theta^{*} \right)\end{align*}
即,如果$F \left( \tilde{P}, \theta \right)$在$\tilde{P}^{*}, \theta^{*}$达到局部极大值(全局最大值),则$L \left( \theta^{*} \right)$在$\tilde{P}^{*}, \theta^{*}$也达到局部极大值(全局最大值)。
由$\tilde{P}_{\theta} \left( Z \right) = P \left( Z | Y, \theta \right)$,对固定的$\theta^{\left( i \right) }$,
\begin{align*} \tilde{P}^{\left( i + 1 \right)} \left( Z \right) = \tilde{P}_{\theta^{\left( i \right)}} \left( Z \right) = P \left( Z | Y, \theta^{\left( i \right) } \right)\end{align*}
使$F \left( \tilde{P}, \theta^{\left( i \right)} \right)$极大化,
则
\begin{align*} \\ & F \left( \tilde{P}^{\left( i + 1 \right)}, \theta \right) = E_{\tilde{P}^{\left( i + 1 \right)}} \left[ \log P \left( Y, Z | \theta \right)\right] + H \left( \tilde{P}^{\left( i + 1 \right)} \right)
\\ & = \sum_{Z} log P \left(Y , Z | \theta \right) P \left( Z | Y, \theta^{\left( i \right)} \right) + H \left( \tilde{P}^{\left( i + 1 \right)} \right)
\\ & =Q \left( \theta, \theta^{\left( i \right)} \right) + H \left( \tilde{P}^{\left( i + 1 \right)} \right)\end{align*}
固定$\tilde{P}^{\left( i + 1 \right)}$,求$\theta^{\left( i \right)}$使$F \left( \tilde{P}^{\left( i + 1 \right)}, \theta \right)$极大化,得
\begin{align*} \theta^{\left( i + 1 \right)} = \arg \max_{\theta} F \left( \tilde{P}^{\left( i + 1 \right)}, \theta \right) = \arg \max_{\theta} Q \left( \theta, \theta^{\left( i \right)} \right) \end{align*}
即,由$EM$算法与$F$函数的极大-极大算法的到的参数估计序列$\theta^{\left( i \right)},i = 1, 2, \cdots,$是一致的。
$GEM$算法:
输入:观测数据$Y$,$F$函数;
输出:模型参数$\theta$