In a system identification setting, we are interested in computing the outgoing message $\overleftarrow{\mu}_\Theta(\theta)$.
Let's compute the sum-product message:
This is not a Gaussian message for $\Theta$! Passing this message into the graph leads to very serious problems when trying to compute sum-product messages for other factors in the graph.
The same problem occurs in a forward message passing schedule when we try to compute a message for $Y$ from incoming Gaussian messages for both $X$ and $\Theta$.
The foregoing example shows that the sum-product (SP) message update rule will sometimes not do the job. For example:
There are various ways to cope with 'intractable' SP update rules. In this lesson, we discuss how the EM-algorithm can be written as a message passing algorithm on factor graphs. Then, we will solve the 'multiplier node problem' with EM messages (rather than with SP messages).
It turns out that for factorized functions $f(x,\theta)$, the EM-algorihm can be executed as a message passing algorithm on the factor graph.
As an simple example, we consider the factorization
where $p_b(x|\hat{\theta}^{(k)}) \triangleq \frac{f_b(x,\hat{\theta}^{(k)})}{\int f_b(x^\prime,\hat{\theta}^{(k)}) \,\mathrm{d}x^\prime}$.
Proof:
The quantity $\eta(\theta)$ (a.k.a. the E-log message) may be interpreted as a log-domain summary of $f_b$. The message $e^{\eta(\theta)}$ is the corresponding 'probability domain' message that is consistent with the semantics of messages as summaries of factors. In a software implementation, you can use either domain, as long as a consistent method is chosen.
Note that the denominator $\int f_b(x^\prime,\hat{\theta}^{(k)}) \,\mathrm{d}x^\prime$ in $p_b$ is just a scaling factor that can usually be ignored, leading to a simpler E-log message $$\eta(\theta) = \int f_b(x,\hat{\theta}^{(k)}) \log f_b(x,\theta) \,\mathrm{d}x \,.$$
leads to easier expressions than the marginalization (which is what we really want) $$ \bar f(\theta) = \int f(x,\theta) \mathrm{d}x . $$
leads to easier expressions than the marginalization (represented by the sum-product message, which is also what we really want) $$ \mu(\theta) = \int f_b(x,\theta) \mathrm{d}x . $$
where $p(x_1,\ldots,x_M | y^{(k)}) \triangleq \frac{\overrightarrow{\mu}_{X_1}(x_1) \cdots \overrightarrow{\mu}_{X_M}(x_M)\, f(x_1,\ldots,x_M,\hat{y}^{(k)})}{\int \overrightarrow{\mu}_{X_1}(x_1) \cdots \overrightarrow{\mu}_{X_M}(x_M)\, f(x_1,\ldots,x_M,\hat{y}^{(k)}) \, \mathrm{d}x_1 \ldots \mathrm{d}x_M}$.
The factors for deterministic nodes are (Dirac) delta functions, e.g., $\delta(y-\theta x)$ for the multiplier.
Note that the outgoing E-log message for a deterministic node will also be a delta function, since the expectation of $\log \delta(\cdot)$ is again a delta function. For details, consult Dauwels et al. (2009) pg.5, section F.
This would stall the iterative estimation process at the current estimate since the outgoing E-log message would express complete certainty about the estimate.
This issue can be resolved by closing a box around a subgraph that includes (the deterministic node) $f$ and at least one non-deterministic factor. EM message passing can now proceed with the newly created node.
where we used the 'canonical' parametrization of the Gaussian $\mathcal{N}_{\xi}(\theta \mid\xi,w) \propto \exp \left( \xi \theta- \frac{1}{2} w \theta^2\right)$.
where $w_g = \frac{1}{v_x} + \frac{\hat{\theta^2}}{v_y}$ and $\xi_g \triangleq w_g m_g = \frac{m_x}{v_x}+\frac{\hat{\theta}m_y}{v_y}$. It follows that
$$\begin{align*}
\mathbb{E}\left[X\right] &= m_g \\
\mathbb{E}\left[X^2\right] &= m_g^2 + w_g^{-1}
\end{align*}$$
It follows that, for a dynamical system with unknown coefficients, both state estimation and parameter learning can be achieved through Gaussian message passing based on SP and EM message update rules.
These (SP and EM) message update rules can be tabularized and implemented in software for a large set of factors that are common in probabilistic models. (See the tables in Loeliger et al. (2007) and Dauwels et al. (2009)).
Tabulated SP and EM messages for frequently occuring factors facilitate the automated derivation of nontrivial inference algorithms.
This makes it possible to automate inference for state and parameter estimation in very complex probabilistic model. Here (in the SPS group at TU/e), we are developing such a factor graph toolbox in Julia.
There is lots more to say about factor graphs. This is a very exciting area of research that promises both
As before let us consider the linear dynamical system (LDS)
$$\begin{align*} z_n &= A z_{n-1} + w_n \\ x_n &= C z_n + v_n \\ w_n &\sim \mathcal{N}(0,\Sigma_w) \\ v_n &\sim \mathcal{N}(0,\Sigma_v) \end{align*}$$Again, we will consider the case where $x_n$ is observed and $z_n$ is a hidden state. $C$, $\Sigma_w$ and $\Sigma_v$ are given parameters but in contrast to the previous section, we will assume that the value of parameter $A$ is unknown.
The cell below loads the style file
open("../../styles/aipstyle.html") do f
display("text/html", readstring(f))
end