We consider a one-dimensional cart position tracking problem, see Faragher 2012.
The hidden states are the position $z_t$ and velocity $\dot z_t$. We can apply an external acceleration/breaking force $u_t$. (Noisy) observations are represented by $x_t$.
The equations of motions are given by
In this lesson, we consider models where the sequence order of observations matters.
Consider the ordered observation sequence $x^T \triangleq \left(x_1,x_2,\ldots,x_T\right)$.
As we have seen, inference tasks in linear Gaussian state space models can be analytically solved.
However, these derivations quickly become cumbersome and prone to errors.
Alternatively, we could specify the generative model in a (Forney-style) factor graph and use automated message passing to infer the posterior over the hidden variables. Here follows some examples.
Filtering, a.k.a. state estimation: estimation of a state (at time step $t$), based on past and current (at $t$) observations.
Smoothing: estimation of a state based on both past and future observations. Needs backward messages from the future.
with $$\begin{align*} \rho_t^2 &= a^2 \sigma_{t-1}^2 + \sigma_z^2 \tag{predicted variance}\\ K_t &= \frac{c \rho_t^2}{c^2 \rho_t^2 + \sigma_x^2} \tag{Kalman gain} \\ \mu_t &= \underbrace{a \mu_{t-1}}_{\text{prior prediction}} + K_t \cdot \underbrace{\left( x_t - c a \mu_{t-1}\right)}_{\text{prediction error}} \tag{posterior mean}\\ \sigma_t^2 &= \left( 1 - c\cdot K_t \right) \rho_t^2 \tag{posterior variance} \end{align*}$$
Kalman filtering consists of computing/updating these last four equations for each new observation ($x_t$). This is a very efficient recursive algorithm to estimate the state $z_t$ from all observations (until $t$).
It turns out that it's also possible to get an analytical result for $p(x_t|x^{t-1})$, which is the model evidence in a filtering context. See optional slides for details.
using Pkg;Pkg.activate("probprog/workspace/");Pkg.instantiate()
IJulia.clear_output();
using LinearAlgebra, PyPlot
include("scripts/cart_tracking_helpers.jl")
# Specify the model parameters
Δt = 1.0 # assume the time steps to be equal in size
A = [1.0 Δt;
0.0 1.0]
b = [0.5*Δt^2; Δt]
Σz = convert(Matrix,Diagonal([0.2*Δt; 0.1*Δt])) # process noise covariance
Σx = convert(Matrix,Diagonal([1.0; 2.0])) # observation noise covariance;
# Generate noisy observations
n = 10 # perform 10 timesteps
z_start = [10.0; 2.0] # initial state
u = 0.2 * ones(n) # constant input u
noisy_x = generateNoisyMeasurements(z_start, u, A, b, Σz, Σx);
m_z = noisy_x[1] # initial predictive mean
V_z = A * (1e8*Diagonal(I,2) * A') + Σz # initial predictive covariance
for t = 2:n
global m_z, V_z, m_pred_z, V_pred_z
#predict
m_pred_z = A * m_z + b * u[t] # predictive mean
V_pred_z = A * V_z * A' + Σz # predictive covariance
#update
gain = V_pred_z * inv(V_pred_z + Σx) # Kalman gain
m_z = m_pred_z + gain * (noisy_x[t] - m_pred_z) # posterior mean update
V_z = (Diagonal(I,2)-gain)*V_pred_z # posterior covariance update
end
println("Prediction: ",ProbabilityDistribution(Multivariate,GaussianMeanVariance,m=m_pred_z,v=V_pred_z))
println("Measurement: ", ProbabilityDistribution(Multivariate,GaussianMeanVariance,m=noisy_x[n],v=Σx))
println("Posterior: ", ProbabilityDistribution(Multivariate,GaussianMeanVariance,m=m_z,v=V_z))
plotCartPrediction2(m_pred_z[1], V_pred_z[1], m_z[1], V_z[1], noisy_x[n][1], Σx[1][1]);
fg = FactorGraph()
z_prev_m = Variable(id=:z_prev_m); placeholder(z_prev_m, :z_prev_m, dims=(2,))
z_prev_v = Variable(id=:z_prev_v); placeholder(z_prev_v, :z_prev_v, dims=(2,2))
bu = Variable(id=:bu); placeholder(bu, :bu, dims=(2,))
@RV z_prev ~ GaussianMeanVariance(z_prev_m, z_prev_v, id=:z_prev) # p(z_prev)
@RV noise_z ~ GaussianMeanVariance(constant(zeros(2), id=:noise_z_m), constant(Σz, id=:noise_z_v)) # process noise
@RV z = constant(A, id=:A) * z_prev + bu + noise_z; z.id = :z # p(z|z_prev) (state transition model)
@RV x ~ GaussianMeanVariance(z, constant(Σx, id=:Σx)) # p(x|z) (observation model)
placeholder(x, :x, dims=(2,));
ForneyLab.draw(fg)
Now that we've built the factor graph, we can perform Kalman filtering by inserting measurement data into the factor graph and performing message passing.
include("scripts/cart_tracking_helpers.jl")
algo = messagePassingAlgorithm(z)
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code))
marginals = Dict()
messages = Array{Message}(undef,6)
z_prev_m_0 = noisy_x[1]
z_prev_v_0 = A * (1e8*Diagonal(I,2) * A') + Σz
for t=2:n
data = Dict(:x => noisy_x[t], :bu => b*u[t],:z_prev_m => z_prev_m_0, :z_prev_v => z_prev_v_0)
step!(data, marginals, messages) # perform msg passing (single timestep)
# Posterior of z becomes prior of z in the next timestep:
z_prev_m_0 = ForneyLab.unsafeMean(marginals[:z])
z_prev_v_0 = ForneyLab.unsafeCov(marginals[:z])
end
# Collect prediction p(z[n]|z[n-1]), measurement p(z[n]|x[n]), corrected prediction p(z[n]|z[n-1],x[n])
prediction = messages[5].dist # the message index is found by manual inspection of the schedule
measurement = messages[6].dist
corr_prediction = convert(ProbabilityDistribution{Multivariate, GaussianMeanVariance}, marginals[:z])
println("Prediction: ",prediction)
println("Measurement: ",measurement)
println("Posterior: ", corr_prediction)
# Make a fancy plot of the prediction, noisy measurement, and corrected prediction after n timesteps
plotCartPrediction(prediction, measurement, corr_prediction);
Now let's proof the Kalman filtering equations including evidence updating by probabilistic calculus: $$\begin{align*} \overbrace{p(z_t\,|\,x^t)}^{\text{posterior}} &= p(z_t\,|\,x_t,x^{t-1}) \\ &= \frac{p(x_t,z_t\,|\,x^{t-1})}{p(x_t\,|\,x^{t-1})} \\ &= \frac{p(x_t\,|\,z_t) \,p(z_t\,|\,x^{t-1})}{p(x_t\,|\,x^{t-1})} \\ &= \frac{p(x_t\,|\,z_t) \, \sum_{z_{t-1}} p(z_t,z_{t-1}\,|\,x^{t-1}) }{p(x_t\,|\,x^{t-1})} \\ &= \frac{p(x_t\,|\,z_t) \, \sum_{z_{t-1}} p(z_t\,|\,z_{t-1}) \, p(z_{t-1}\,|\,x^{t-1})}{p(x_t\,|\,x^{t-1})} \\ &= \frac{\overbrace{\mathcal{N}(x_t|c z_t, \sigma_x^2)}^{\text{likelihood}} \, \sum_{z_{t-1}} \overbrace{\mathcal{N}(z_t\,|\,a z_{t-1},\sigma_z^2)}^{\text{state transition}} \, \overbrace{\mathcal{N}(z_{t-1} \,|\, \mu_{t-1}, \sigma_{t-1}^2) }^{\text{prior}} }{\underbrace{p(x_t\,|\,x^{t-1})}_{\text{evidence}}} \end{align*}$$
The posterior $p(z_t\,|\,x^t)$ is proportional to the numerator, which by making use of the renormalization equality $$\begin{align} \mathcal{N}(x\,|\,cz,\sigma^2) = \frac{1}{c}\cdot \mathcal{N}\left(z \,\big|\,\frac{x}{c},\left(\frac{\sigma}{c}\right)^2\right) \tag{renormalization}\,, \end{align}$$ can be evaluated with Gaussian multiplication rules:
with $$\begin{align*} \rho_t^2 &= a^2 \sigma_{t-1}^2 + \sigma_z^2 \tag{predicted variance}\\ K_t &= \frac{c \rho_t^2}{c^2 \rho_t^2 + \sigma_x^2} \tag{Kalman gain} \\ \mu_t &= \underbrace{a \mu_{t-1}}_{\text{prior prediction}} + K_t \cdot \underbrace{\left( x_t - c a \mu_{t-1}\right)}_{\text{prediction error}} \tag{posterior mean}\\ \sigma_t^2 &= \left( 1 - c\cdot K_t \right) \rho_t^2 \tag{posterior variance} \end{align*}$$
open("../../styles/aipstyle.html") do f display("text/html", read(f, String)) end