Provided are two examples of nonlinear state-space models on which one can perform Bayesian filtering and smoothing in order to obtain a posterior distribution over a latent state trajectory based on noisy observations. In order to understand the theory behind these methods in detail we refer to [1] and [2].
We provide examples for two different types of state-space model:
In order to perform Bayesian Filtering and Smoothing on non-linear models, we show how to use functionalities provided by ProbNum to either linearise the respective model (Extended Kalman Filter, EKF) or to use an Unscented Transform (Unscented Kalman Filter, UKF) which does not require a Jacobian of the nonlinear model.
References:
[1] Särkkä, Simo, and Solin, Arno. Applied Stochastic Differential Equations. Cambridge University Press, 2019.
[2] Särkkä, Simo. Bayesian Filtering and Smoothing. Cambridge University Press, 2013.
import numpy as np
import probnum as pn
from probnum import filtsmooth, randvars, statespace
from probnum.problems import RegressionProblem
np.random.seed(12345)
# Make inline plots vector graphics instead of raster graphics
%matplotlib inline
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'svg')
# Plotting
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
plt.style.use('../../probnum.mplstyle')
For nonlinear models, we assume that at least one of the components of our state-space model (i.e. either the dynamics, the measurement model, or both) is a nonlinear function of the latent state.
Consider nonlinear functions $f: \mathbb{R}^d \rightarrow \mathbb{R}^d$ and $h: \mathbb{R}^d \rightarrow \mathbb{R}^m$ where $d$ is the dimension of the latent state and $m$ is the dimension of the measurements.
For $k = 1, \dots, K$ and $x_0 \sim \mathcal{N}(\mu_0, \Sigma_0)$:
$$ \begin{align} \boldsymbol{x}_k &\sim \mathcal{N}(f(\boldsymbol{x}_{k-1}), \boldsymbol{Q}) \\ \boldsymbol{y}_k &\sim \mathcal{N}(h(\boldsymbol{x}_k), \boldsymbol{R}) \end{align} $$This defines a dynamics model that assumes a state $x_k$ in a discrete sequence of states arising from a nonlinear map of the previous state $x_{k-1}$ corrupted with additive Gaussian noise under a process noise covariance matrix $Q$.
Similarly, the measurements $y_k$ are assumed to be nonlinear transformations of the latent state under additive Gaussian noise according to a measurement noise covariance $R$.
In the described case, we cannot use the Kalman Filtering equations since these assume linear relationships. One method is to linearise the nonlinear parts of the model by a first-order Taylor Expansion (Extended Kalman Filter, EKF) or employing the Unscented Kalman Filter (UKF) which is based on the unscented transform and does not require the computation of a Jacobian.
The probnum
packages provides both methods by wrapping linearizing "Components" around nonlinear dynamcis and/or measurement models, as we will see below.
Note that this can be generalized to a nonlinear time-varying state-space model, as well. Then we would have $f: \mathbb{T} \times \mathbb{R}^d \rightarrow \mathbb{R}^d$ and $h: \mathbb{T} \times \mathbb{R}^d \rightarrow \mathbb{R}^m$ where $\mathbb{T}$ is the "time dimension".
state_dim = 2
observation_dim = 1
# approx. gravitational constant
g = 9.81
# dt
delta_t = 0.0075
def pendulum_rhs(state):
"""Right-hand side of an ODE that defines pendulum dynamics"""
x1, x2 = state
y1 = x1 + x2 * delta_t
y2 = x2 - g * np.sin(x1) * delta_t
return np.array([y1, y2])
def pendulum_jacobian(state):
"""Jacobian of the pendulum ODE"""
x1, x2 = state
dy1_dx = [1.0, delta_t]
dy2_dx = [-g * np.cos(x1) * delta_t, 1.0]
return np.array([dy1_dx, dy2_dx])
dynamics_transition_function = lambda t, state: pendulum_rhs(state)
dynamics_transition_jacobian_function = lambda t, state: pendulum_jacobian(state)
dynamics_diffusion_matrix = 1.0 * (
np.diag(np.array([delta_t ** 3 / 3, delta_t]))
+ np.diag(np.array([delta_t ** 2 / 2]), 1)
+ np.diag(np.array([delta_t ** 2 / 2]), -1)
)
We here define the non-linear dynamics of a pendulum. Note (in the cell directly above) that probnum
expects dynamicsfun
to be a function of time and state. Even though we here consider a nonlinear but time-invariant model, we still have to fulfill the contract that the interface requires.
dynamics_transition_function = lambda t, state: pendulum_rhs(state)
In other words, we simply ignore the time variable t
in case of time-invariant models.
To create a discrete, (non-linear) Gaussian dynamics model, probnum
provides the DiscreteGaussian
class that takes
input_dim
and output_dim
state_trans_fun
: the function that returns the next state given the current oneproc_noise_cov_mat_fun
: the function that returns the covariance matrix of the Gaussian process noisejacob_state_trans_fun
: the function that returns the Jacobian matrix of the state transition function# Create discrete, non-linear Gaussian dynamics model
dynamics_model = statespace.DiscreteGaussian(
input_dim=state_dim,
output_dim=state_dim,
state_trans_fun=dynamics_transition_function,
proc_noise_cov_mat_fun=lambda t: dynamics_diffusion_matrix,
jacob_state_trans_fun=dynamics_transition_jacobian_function,
)
def pendulum_measurement(state):
x1, x2 = state
return np.array([np.sin(x1)])
def pendulum_measurement_jacobian(state):
x1, x2 = state
return np.array([[np.cos(x1), 0.0]])
measurement_function = lambda t, state: pendulum_measurement(state)
measurement_jacobian_function = lambda t, state: pendulum_measurement_jacobian(state)
measurement_variance = 0.32 ** 2
measurement_covariance = measurement_variance * np.eye(observation_dim)
# Create discrete, non-linear Gaussian measurement model
measurement_model = statespace.DiscreteGaussian(
input_dim=state_dim,
output_dim=observation_dim,
state_trans_fun=measurement_function,
proc_noise_cov_mat_fun=lambda t: measurement_covariance,
jacob_state_trans_fun=measurement_jacobian_function,
)
mu_0 = np.ones(state_dim)
sigma_0 = measurement_variance * np.eye(state_dim)
initial_state_rv = randvars.Normal(mean=mu_0, cov=sigma_0)
statespace.generate_samples()
is used to sample both latent states and noisy observations from the specified state space model.
time_grid = np.arange(0., 5., step=delta_t)
latent_states, observations = statespace.generate_samples(
dynmod=dynamics_model,
measmod=measurement_model,
initrv=initial_state_rv,
times=time_grid
)
regression_problem = RegressionProblem(
observations=observations,
locations=time_grid,
)
Since we could easily derive the Jacobian of our nonlinear model components above, we can use the Extended Kalman Filter (EKF) to obtain a filtering posterior over the states. Therefore, in probnum
we just wrap the dynamics model and the measurement model in a DiscreteEKFComponent
object.
In cases in which the Jacobian is not as easy to write down, probnum
also provides the DiscreteUKFComponent
class that implements the linearization by means of an
unscented Kalman filter that does not require a Jacobian.
Alternatively, the Jacobian can be calculated via automatic differentiation.
linearised_dynamics_model = filtsmooth.DiscreteEKFComponent(dynamics_model)
linearised_measurement_model = filtsmooth.DiscreteEKFComponent(measurement_model)
Then, we simply pass these linearized models to the Kalman Filter (exactly the same interface as in the linear case) and proceed.
kalman_filter = filtsmooth.Kalman(
dynamics_model=linearised_dynamics_model,
measurement_model=linearised_measurement_model,
initrv=initial_state_rv
)
state_posterior, _ = kalman_filter.filtsmooth(regression_problem)
The method filtsmooth
returns a KalmanPosterior
object which provides convenience functions for e.g. sampling and interpolation.
We can also extract the just computed posterior smoothing state variables by querying the .state_rvs
property.
This yields a list of Gaussian Random Variables from which we can extract the statistics in order to visualize them.
posterior_state_rvs = state_posterior.states # List of <num_time_points> Normal Random Variables
posterior_state_means = posterior_state_rvs.mean # Shape: (num_time_points, state_dim)
posterior_state_covs = posterior_state_rvs.cov # Shape: (num_time_points, state_dim, state_dim)
state_fig = plt.figure()
state_fig_gs = gridspec.GridSpec(ncols=2, nrows=1, figure=state_fig)
ax_00 = state_fig.add_subplot(state_fig_gs[0, 0])
ax_01 = state_fig.add_subplot(state_fig_gs[0, 1])
# Plot means
mu_x_1, mu_x_2 = [posterior_state_means[:, i] for i in range(state_dim)]
ax_00.plot(time_grid, mu_x_1, label="posterior mean")
ax_01.plot(time_grid, mu_x_2)
# Plot marginal standard deviations
std_x_1, std_x_2 = [np.sqrt(posterior_state_covs[:, i, i]) for i in range(state_dim)]
ax_00.fill_between(
time_grid,
mu_x_1 - 1.96 * std_x_1,
mu_x_1 + 1.96 * std_x_1,
alpha=0.2,
label="1.96 marginal stddev",
)
ax_01.fill_between(
time_grid, mu_x_2 - 1.96 * std_x_2, mu_x_2 + 1.96 * std_x_2, alpha=0.2
)
# Plot 5 samples on a subsampled grid
samples = state_posterior.sample(t=state_posterior.locations[::5], size=5)
for smp in samples:
ax_00.plot(state_posterior.locations[::5], smp[:, 0], color="gray", alpha=0.75, linewidth=1, linestyle="dashed", label="sample")
ax_01.plot(state_posterior.locations[::5], smp[:, 1], color="gray",alpha=0.75, linewidth=1, linestyle="dashed", label="sample")
# Plot groundtruth
ax_00.scatter(time_grid, observations, marker=".", label="measurements")
# Add labels etc.
ax_00.set_xlabel("t")
ax_01.set_xlabel("t")
ax_00.set_title(r"$x_1$")
ax_01.set_title(r"$x_2$")
# The following two lines are just to remove duplicate labels (caused by samples) from the legend
handles, labels = ax_00.get_legend_handles_labels()
by_label = dict(zip(labels, handles))
state_fig.legend(
by_label.values(), by_label.keys(), loc="center left", bbox_to_anchor=(1, 0.5)
)
state_fig.tight_layout()
Now, consider we assume continuous dynamics. We assume that there is a continuous process that defines the nonlinear dynamics of our latent space from which we collect discrete nonlinear-Gaussian measurements (as above). Only the dynamics model changes. In particular, we formulate the dynamics as a stochastic process in terms of a Nonlinear Time-Invariant Stochastic Differential Equation. We refer to [1] for more details. Again, consider non-linear functions $f: \mathbb{R}^d \rightarrow \mathbb{R}^d$ and $h: \mathbb{R}^d \rightarrow \mathbb{R}^m$ where $d$ is the dimension of the latent state and $m$ is the dimension of the measurements. We define the following nonlinear continuous-discrete state-space model:
Let $x(t_0) \sim \mathcal{N}(\mu_0, \Sigma_0)$.
$$ \begin{align} d\boldsymbol{x} &= f(\boldsymbol{x}) \, dt + \boldsymbol{L} \, d \boldsymbol{\omega} \\ \boldsymbol{y}_k &\sim \mathcal{N}(h(\boldsymbol{x}(t_k)), \boldsymbol{R}), \qquad k = 1, \dots, K \end{align} $$where $\boldsymbol{\omega}$ denotes a vector of driving forces (often Brownian Motion).
Note that this can be generalized to a nonlinear time-varying state-space model, as well. Then we would have $f: \mathbb{T} \times \mathbb{R}^d \rightarrow \mathbb{R}^d$ and $h: \mathbb{T} \times \mathbb{R}^d \rightarrow \mathbb{R}^m$ where $\mathbb{T}$ is the "time dimension".
state_dim = 1
observation_dim = 1
drift_function = lambda t, state: np.tanh(state)
drift_function_jacobian = lambda t, state: 1.0 - np.tanh(state) ** 2
dispersion_matrix_function = lambda t: np.ones(state_dim).reshape(1, 1)
For continous, non-linear dynamics, probnum
provides a general stochastic differential equation (SDE) interface that takes
driftfun
: a (possibly non-linear) drift function,dispmatfun
: a (possibly non-linear) dispersion function,jacobfun
: the Jacobian of the drift function.# Create continuous, non-linear SDE dynamics model
dynamics_model = statespace.SDE(
dimension=state_dim,
driftfun=drift_function,
dispmatfun=dispersion_matrix_function,
jacobfun=drift_function_jacobian,
)
measurement_marginal_variance = 1.0
measurement_matrix = np.eye(observation_dim, state_dim)
measurement_noise_matrix = measurement_marginal_variance * np.eye(observation_dim)
measurement_model = statespace.DiscreteLTIGaussian(
state_trans_mat=measurement_matrix,
shift_vec=np.zeros(observation_dim),
proc_noise_cov_mat=measurement_noise_matrix,
)
mu_0 = np.zeros(state_dim)
sigma_0 = 3.0 * np.eye(state_dim)
initial_state_rv = randvars.Normal(mean=mu_0, cov=sigma_0)
statespace.generate_samples()
is used to sample both latent states and noisy observations from the specified state space model.
time_grid = np.arange(0.0, 10.0, step=10*delta_t)
linearised_dynamics_model = filtsmooth.ContinuousEKFComponent(dynamics_model)
latent_states, observations = statespace.generate_samples(
dynmod=linearised_dynamics_model,
measmod=measurement_model,
initrv=initial_state_rv,
times=time_grid,
)
regression_problem = RegressionProblem(
observations=observations,
locations=time_grid,
)
kalman_filter = filtsmooth.Kalman(
dynamics_model=linearised_dynamics_model,
measurement_model=measurement_model,
initrv=initial_state_rv,
)
state_posterior, _ = kalman_filter.filter(regression_problem)
# smoothing
smoothed_state_posterior = kalman_filter.smooth(
filter_posterior=state_posterior
)
The method filtsmooth
returns a KalmanPosterior
object which provides convenience functions for e.g. sampling and prediction.
We can also extract the just computed posterior smoothing state variables by querying the .state_rvs
property.
This yields a list of Gaussian Random Variables from which we can extract the statistics in order to visualize them.
grid = np.linspace(0, 11, 500)
posterior_state_rvs = state_posterior(grid) # List of <num_time_points> Normal Random Variables
posterior_state_means = posterior_state_rvs.mean.squeeze() # Shape: (num_time_points, state_dim)
posterior_state_covs = posterior_state_rvs.cov # Shape: (num_time_points, state_dim, state_dim)
smoothed_posterior_state_rvs = smoothed_state_posterior.states # List of <num_time_points> Normal Random Variables
smoothed_posterior_state_means = smoothed_posterior_state_rvs.mean.squeeze() # Shape: (num_time_points, state_dim)
smoothed_posterior_state_covs = smoothed_posterior_state_rvs.cov # Shape: (num_time_points, state_dim, state_dim)
state_fig = plt.figure()
ax = state_fig.add_subplot()
# Plot means
ax.plot(grid, posterior_state_means, label="posterior mean")
# Plot marginal standard deviations
std_x = np.sqrt(np.abs(posterior_state_covs)).squeeze()
ax.fill_between(
grid,
posterior_state_means - 1.96 * std_x,
posterior_state_means + 1.96 * std_x,
alpha=0.2,
label="1.96 marginal stddev for filter",
)
# Plot smoothed means
ax.plot(time_grid, smoothed_posterior_state_means, label="smoothed posterior mean")
# Plot marginal standard deviations
std_x = np.sqrt(smoothed_posterior_state_covs).squeeze()
ax.fill_between(
time_grid,
smoothed_posterior_state_means - 1.96 * std_x,
smoothed_posterior_state_means + 1.96 * std_x,
alpha=0.2,
label="1.96 marginal stddev for smoother",
)
ax.scatter(time_grid, observations, marker=".", label="measurements")
# Add labels etc.
ax.set_xlabel("t")
ax.set_title(r"$x$")
ax.legend()
state_fig.tight_layout()
To get a closer look at filter and smoother, we inspect the interval $[0, 1]$ in greater detail.
idx = time_grid < 1.0
ax.set_xlim([0, 1])
ax.set_ylim(np.min(observations[idx]), np.max(observations[idx]))
state_fig.tight_layout()
state_fig