where $k=1,2,\ldots,n$ is the time step counter; $z_k$ is an unobserved state sequence; $x_k$ is an observed sequence; $w_k \sim \mathcal{N}(0,\Sigma_w)$ and $v_k \sim \mathcal{N}(0,\Sigma_v)$ are (unobserved) state and observation noise sequences respectively; $z_0 \sim \mathcal{N}(0,\Sigma_0)$ is the initial state and $A$, $C$, $\Sigma_v$,$\Sigma_w$ and $\Sigma_0$ are known parameters. The Forney-style factor graph (FFG) for one time step is depicted here:
(a) Rewrite the state-space equations as a set of conditional probability distributions.
$$\begin{align*}
p(z_k|z_{k-1},A,\Sigma_w) &= \ldots \\
p(x_k|z_k,C,\Sigma_v) &= \ldots \\
p(z_0|\Sigma_0) &= \ldots
\end{align*}$$
$$\begin{align*}
p(z_k|z_{k-1},A,\Sigma_w) &= \mathcal{N}(z_k|A z_{k-1},\Sigma_w) \ p(x_k|z_k,C,\Sigma_v) &= \mathcal{N}(x_k|C z_k,\Sigma_v) \ p(z_0|\Sigma_0) &= \mathcal{N}(z_0|0,\Sigma_0) \end{align*}$$
(b) Define $z^n \triangleq (z_0,z_1,\ldots,z_n)$, $x^n \triangleq (x_1,\ldots,x_n)$ and $\theta=\{A,C,\Sigma_w,\Sigma_v\}$. Now write out the generative model $p(x^n,z^n|\theta)$ as a product of factors.
$$\begin{align*}
p(x^n,z^n|\theta) &= p(z_0|\Sigma_0) \prod_{k=1}^n p(x_k|z_k,C,\Sigma_v) ,p(z_k|z_{k-1},A,\Sigma_w) \ &= \mathcal{N}(z_0|0,\Sigma_0) \prod_{k=1}^n \mathcal{N}(x_k|C z_k,\Sigma_v) ,\mathcal{N}(z_k|A z_{k-1},\Sigma_w) \end{align*}$$
(c) We are interested in estimating $z_k$ from a given estimate for $z_{k-1}$ and the current observation $x_k$, i.e., we are interested in computing $p(z_k|z_{k-1},x_k,\theta)$. Can $p(z_k|z_{k-1},x_k,\theta)$ be expressed as a Gaussian distribution? Explain why or why not in one sentence.
Yes, since the generative model $p(x^n,z^n|\theta)$ is (one big) Gaussian.
(d) Copy the graph onto your exam paper and draw the message passing schedule for computing $p(z_k|z_{k-1},x_k,\theta)$ by drawing arrows in the factor graph. Indicate the order of the messages by assigning numbers to the arrows.
Some permutations of this order are also possible. The most important thing here is that you recognize the tree with $Z_k$ as a root of the tree and pass messages from the terminals (e.g., $Z_{k-1}$, $X_k$, etc.) towards the root.
(e) Now assume that our belief about parameter $\Sigma_v$ is instead given by a distribution $p(\Sigma_v)$ (rather than a known value). Adapt the factor graph drawing of the previous answer to reflects our belief about $\Sigma_v$.
See drawing in previous answer.