Filtering Multiple Random Variables
%matplotlib inline
#format the book
import book_format
book_format.set_style()
We are now ready to study and implement the full, multivariate form of the Kalman filter. In the last chapter we learned how multivariate Gaussians express the correlation between multiple random variables, such as the position and velocity of an aircraft. We also learned how correlation between variables drastically improves the posterior. If we only roughly know position and velocity, but they are correlated, then our new estimate can be very accurate.
I prefer that you develop an intuition for how these filters work through several worked examples. I'm going to gloss over many issues. Some things I show you will only work for special cases, others will be 'magical' - it will not be clear how I derived a certain result. If I started with rigorous, generalized equations you would be left scratching your head about what all these terms mean and how you might apply them to your problem. In later chapters I will provide a more rigorous mathematical foundation, and at that time I will have to either correct approximations that I made in this chapter or provide additional information that I did not cover here.
To make this possible we will restrict ourselves to a subset of problems which we can describe with Newton's equations of motion. These filters are called discretized continuous-time kinematic filters. In the Kalman Filter Math chapter we will develop the math for non-Newtonian systems.
Newton's equations of motion tells us that given a constant velocity $v$ of a system we can compute its position $x$ after time $t$ with:
$$x = vt + x_0$$For example, if we start at position 13, our velocity is 10 m/s, and we travel for 12 seconds our final position is 133 ($10\times 12 + 13$).
We can incorporate constant acceleration with this equation
$$x = \frac{1}{2}at^2 + v_0t + x_0$$And if we assume constant jerk we get
$$x = \frac{1}{6}jt^3 + \frac{1}{2}a_0 t^2 + v_0 t + x_0$$These equations were generated by integrating a differential equation. Given a constant velocity v we can compute the distance traveled over time with the equation
$$x = vt + x_0$$which we can derive with
$$\begin{aligned} v &= \frac{dx}{dt}\\ dx &= v\, dt \\ \int_{x_0}^x\, dx &= \int_0^t v\, dt\\ x - x_0 &= vt - 0\\ x &= vt + x_0\end{aligned}$$When you design a Kalman filter you start with a system of differential equations that describe the dynamics of the system. Most systems of differential equations do not easily integrate in this way. We start with Newton's equation because we can integrate and get a closed form solution, which makes the Kalman filter easier to design. An added benefit is that Newton's equations are the right equations to use to track moving objects, one of the main uses of Kalman filters.
The algorithm is the same Bayesian filter algorithm that we have used in every chapter. The update step is slightly more complicated, but I will explain why when we get to it.
Initialization
1. Initialize the state of the filter
2. Initialize our belief in the state
Predict
1. Use process model to predict state at the next time step
2. Adjust belief to account for the uncertainty in prediction
Update
1. Get a measurement and associated belief about its accuracy
2. Compute residual between estimated state and measurement
3. Compute scaling factor based on whether the measurement
or prediction is more accurate
4. set state between the prediction and measurement based
on scaling factor
5. update belief in the state based on how certain we are
in the measurement
As a reminder, here is a graphical depiction of the algorithm:
import kf_book.book_plots as book_plots
book_plots.show_residual_chart()
The univariate Kalman filter represented the state with a univariate Gaussian. Naturally the multivariate Kalman filter will use a multivariate Gaussian for the state. We learned in the last chapter that multivariate Gaussians use a vector for the mean and a matrix for the covariances. That means that the Kalman filter needs to use linear algebra to perform the estimations.
I don't want you to memorize these equations, but I have listed the univariate and multivariate equations below. They are quite similar.
Predict
$\begin{array}{|l|l|l|} \hline \text{Univariate} & \text{Univariate} & \text{Multivariate}\\ & \text{(Kalman form)} & \\ \hline \bar \mu = \mu + \mu_{f_x} & \bar x = x + dx & \bar{\mathbf x} = \mathbf{Fx} + \mathbf{Bu}\\ \bar\sigma^2 = \sigma_x^2 + \sigma_{f_x}^2 & \bar P = P + Q & \bar{\mathbf P} = \mathbf{FPF}^\mathsf T + \mathbf Q \\ \hline \end{array}$
Without worrying about the specifics of the linear algebra, we can see that:
$\mathbf x,\, \mathbf P$ are the state mean and covariance. They correspond to $x$ and $\sigma^2$.
$\mathbf F$ is the state transition function. When multiplied by $\bf x$ it computes the prior.
$\mathbf Q$ is the process covariance. It corresponds to $\sigma^2_{f_x}$.
$\mathbf B$ and $\mathbf u$ are new to us. They let us model control inputs to the system.
Update
$\begin{array}{|l|l|l|} \hline \text{Univariate} & \text{Univariate} & \text{Multivariate}\\ & \text{(Kalman form)} & \\ \hline & y = z - \bar x & \mathbf y = \mathbf z - \mathbf{H\bar x} \\ & K = \frac{\bar P}{\bar P+R}& \mathbf K = \mathbf{\bar{P}H}^\mathsf T (\mathbf{H\bar{P}H}^\mathsf T + \mathbf R)^{-1} \\ \mu=\frac{\bar\sigma^2\, \mu_z + \sigma_z^2 \, \bar\mu} {\bar\sigma^2 + \sigma_z^2} & x = \bar x + Ky & \mathbf x = \bar{\mathbf x} + \mathbf{Ky} \\ \sigma^2 = \frac{\sigma_1^2\sigma_2^2}{\sigma_1^2+\sigma_2^2} & P = (1-K)\bar P & \mathbf P = (\mathbf I - \mathbf{KH})\mathbf{\bar{P}} \\ \hline \end{array}$
$\mathbf H$ is the measurement function. We haven't seen this yet in this book and I'll explain it later. If you mentally remove $\mathbf H$ from the equations, you should be able to see these equations are similar as well.
$\mathbf z,\, \mathbf R$ are the measurement mean and noise covariance. They correspond to $z$ and $\sigma_z^2$ in the univariate filter (I've substituted $\mu$ with $x$ for the univariate equations to make the notation as similar as possible).
$\mathbf y$ and $\mathbf K$ are the residual and Kalman gain.
The details will be different than the univariate filter because these are vectors and matrices, but the concepts are exactly the same:
Your job as a designer will be to design the state $\left(\mathbf x, \mathbf P\right)$, the process $\left(\mathbf F, \mathbf Q\right)$, the measurement $\left(\mathbf z, \mathbf R\right)$, and the measurement function $\mathbf H$. If the system has control inputs, such as a robot, you will also design $\mathbf B$ and $\mathbf u$.
I have programmed the equations of the Kalman filter into the predict
and update
functions in FilterPy. You will import them with:
from filterpy.kalman import predict, update
Let's go back to our tried and true problem of tracking a dog. This time we will include the fundamental insight of the previous chapter and use hidden variables to improve our estimates. I could start with the math, but instead let's implement a filter, learning as we go. On the surface the math is different and perhaps more complicated than the previous chapters, but the ideas are all the same - we are just multiplying and adding Gaussians.
We start by writing a simulation for the dog. The simulation will run for count
steps, moving the dog forward approximately 1 meter for each step. At each step the velocity will vary according to the process variance process_var
. After updating the position we compute a measurement with an assumed sensor variance of z_var
. The function returns an NumPy array of the positions and another of the measurements.
import math
import numpy as np
from numpy.random import randn
def compute_dog_data(z_var, process_var, count=1, dt=1.):
"returns track, measurements 1D ndarrays"
x, vel = 0., 1.
z_std = math.sqrt(z_var)
p_std = math.sqrt(process_var)
xs, zs = [], []
for _ in range(count):
v = vel + (randn() * p_std)
x += v*dt
xs.append(x)
zs.append(x + randn() * z_std)
return np.array(xs), np.array(zs)
For the prediction we need to design the state and covariance, the process model and the process noise, and optionally the control input. We'll take them in order.
We previously tracked a dog in one dimension by using a Gaussian. The mean $(\mu)$ represented the most likely position, and the variance ($\sigma^2$) represented the probability distribution of the position. The position is the state of the system, and we call $\mu$ the state variable.
In this problem we will be tracking both the position and velocity of the dog. This requires us to use a multivariate Gaussian represented with the state vector $\mathbf x$ and its corresponding covariance matrix $\mathbf P$.
State variables can either be observed variables - directly measured by a sensor, or hidden variables - inferred from the observed variables. For our dog tracking problem the sensor only reads position, so position is observed and velocity is hidden. We will learn how to track hidden variables soon.
It is important to understand that tracking position and velocity is a design choice with implications and assumptions that we are not yet prepared to explore. For example, we could also track acceleration, or even jerk. For now, recall that in the last chapter we showed that including velocity in the covariance matrix resulted in much smaller variances in position. We will learn how the Kalman filter computes estimates for hidden variables later in this chapter.
In the univariate chapter we represented the dog's position with a scalar value (e.g. $\mu=3.27$). In the last chapter we learned to use a multivariate Gaussian for multiple variables. For example, if we wanted to specify a position of 10.0 m and a velocity of 4.5 m/s, we would write:
$$\mu = \begin{bmatrix}10.0\\4.5\end{bmatrix}$$The Kalman filter is implemented using linear algebra. We use an $n\times 1$ matrix (called a vector) to store $n$ state variables. For the dog tracking problem, we use $x$ to denote position, and the first derivative of $x$, $\dot x$, for velocity. I use Newton's dot notation for derivatives; $\dot x$ represents the first derivative of x with respect to t: $\dot x = \frac{dx}{dt}$. Kalman filter equations use $\mathbf x$ for the state, so we define $\mathbf x$ as:
$$\mathbf x =\begin{bmatrix}x \\ \dot x\end{bmatrix}$$We use $\mathbf x$ instead of $\mu$, but recognize this is the mean of the multivariate Gaussian.
Another way to write this is $\mathbf x =\begin{bmatrix}x & \dot x\end{bmatrix}^\mathsf T$ because the transpose of a row vector is a column vector. This notation is easier to use in text because it takes less vertical space.
$\mathbf x$ and the position $x$ coincidentally have the same name. If we were tracking the dog in the y-axis we would write $\mathbf x =\begin{bmatrix}y & \dot y\end{bmatrix}^\mathsf T$, not $\mathbf y =\begin{bmatrix}y & \dot y\end{bmatrix}^\mathsf T$. $\mathbf x$ is the standard name for the state variable used in the Kalman filter literature and we will not vary it to give it a more meaningful name. This consistency in naming allows us to communicate with our peers.
Let's code this. Initialization of x
is as simple as
x = np.array([[10.0],
[4.5]])
x
array([[10. ], [ 4.5]])
I often use the transpose in my code to turn a row matrix into a column vector, as I find it easier to type and read:
x = np.array([[10., 4.5]]).T
x
array([[10. ], [ 4.5]])
However, NumPy recognizes 1D arrays as vectors, so I can simplify this line to use a 1D array.
x = np.array([10.0, 4.5])
x
array([10. , 4.5])
All of the array elements have the same type, typically either float
or int
. If the list contains all int
s then the created array will also have a data type of int
, otherwise it will be float
. I will often take advantage of this by only specifying one number as a floating point:
np.array([1., 0, 0, 0, 0, 0])
array([1., 0., 0., 0., 0., 0.])
Here are some examples.
A = np.array([[1, 2], [3, 4]])
x = np.array([[10.0], [4.5]])
# matrix multiply
print(np.dot(A, x))
print()
[[19.] [48.]]
In Python 3.5+ we have the matrix multiplier @, where np.dot(A, B) == A @ B
. It is somewhat less useful then you might realize because it requires both A
and B
to be arrays. It is entirely valid in the math in this book for some of these variables to be scalars, therefore the utility of @
is often lost.
# alternative matrix multiply)
print(A @ x)
print()
x = np.array([[10.0, 4.5]]).T
print(A @ x)
print()
x = np.array([10.0, 4.5])
print(A @ x)
[[19.] [48.]] [[19.] [48.]] [19. 48.]
The last returns a 1D array, but I have written the Kalman filter class to be able to handle this. In retrospect that might lead to confusion, but it does work.
The other half of the state Gaussian is the covariance matrix $\mathbf P$. In the univariate Kalman filter we specified an initial value for $\sigma^2$, and then the filter took care of updating its value as measurements were added to the filter. The same thing happens in the multidimensional Kalman filter. We specify an initial value for $\mathbf P$ and the filter updates it during each epoch.
We need to set the variances to reasonable values. For example, we may choose $\sigma_\mathtt{pos}^2=500 m^2$ if we are quite uncertain about the initial position. Top speed for a dog is around 21 m/s, so in the absence of any other information about the velocity we can set $3\sigma_\mathtt{vel}=21$, or $\sigma_\mathtt{vel}^2=7^2=49$.
In the last chapter we showed that the position and velocities are correlated. But how correlated are they for a dog? I have no idea. As we will see the filter computes this for us, so I initialize the covariances to zero. Of course, if you know the covariances you should use them.
Recall that the diagonals of the covariance matrix contains the variance of each variable, and the off-diagonal elements contains the covariances. Thus we have:
$$ \mathbf P = \begin{bmatrix}500 & 0 \\ 0&49\end{bmatrix} $$We can use numpy.diag
, which creates a diagonal matrix from the values for the diagonal. Recall from linear algebra that a diagonal matrix is one with zeros in the off-diagonal elements.
P = np.diag([500., 49.])
P
array([[500., 0.], [ 0., 49.]])
I could have written:
P = np.array([[500., 0.],
[0., 49.]])
P
array([[500., 0.], [ 0., 49.]])
We are done. We've expressed the state of the filter as a multivariate Gaussian and implemented it in code.
The next step is designing the process model. It is a mathematical model which describes the behavior of the system. The filter uses it to predict the state after a discrete time step. We do this with a set of equations that describe the dynamics of the system.
In the univariate chapter we modeled the dog's motion with
$$ x = v \Delta t + x_0$$We implemented this as follows:
def predict(pos, movement):
return gaussian(pos.mean + movement.mean,
pos.var + movement.var)
We will do the same thing in this chapter, using multivariate Gaussians instead of univariate Gaussians. You might imagine this sort of implementation:
$$ \mathbf x = \begin{bmatrix}5.4\\4.2\end{bmatrix}, \, \, \dot{\mathbf x} = \begin{bmatrix}1.1\\0.\end{bmatrix} \\ \mathbf x = \dot{\mathbf x}t + \mathbf x$$But we need to generalize this. The Kalman filter equations work with any linear system, not just Newtonian ones. Maybe the system you are filtering is the plumbing system in a chemical plant, and the flow in a given pipe is determined by a linear combination of the settings of different valves.
$$\mathtt{pipe_1} = 0.134(\mathtt{valve}_1) + 0.41(\mathtt{valve}_2 - \mathtt{valve}_3) + 1.34$$$$\mathtt{pipe_2} = 0.210(\mathtt{valve}_2) - 0.62(\mathtt{valve}_1 - \mathtt{valve}_5) + 1.86$$Linear algebra has a powerful way to express systems of equations. Take this system
$$\begin{cases} 2x+3y=8\\4x-y=2 \end{cases}$$We can put this in matrix form by writing:
$$\begin{bmatrix}2& 3 \\ 4&-1\end{bmatrix} \begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}8\\2\end{bmatrix}$$If you perform the matrix multiplication in this equation the result will be the two equations above. In linear algebra we would write this as $\mathbf{Ax}=\mathbf B$, where
$$\mathbf{A} = \begin{bmatrix}2& 3 \\ 4&-1\end{bmatrix},\, \mathbf x = \begin{bmatrix}x\\y\end{bmatrix}, \mathbf B=\begin{bmatrix}8\\2\end{bmatrix}$$And then we can use the SciPy's linalg
package to solve for $\mathbf x$:
from scipy.linalg import solve
A = np.array([[2, 3],[4, -1]])
b = np.array([[8], [2]])
x = solve(A, b)
x
array([[1.], [2.]])
We use the process model to perform the innovation, because the equations tell us what the next state will be given the current state. Kalman filters implement this using this linear equation, where $\mathbf{\bar x}$ is the prior, or predicted state:
$$\mathbf{\bar x} = \mathbf{Fx}$$which we can make explicit as
$$\begin{bmatrix} \bar x \\ \dot{\bar x}\end{bmatrix} = \begin{bmatrix}? & ? \\? & ?\end{bmatrix}\begin{bmatrix}x\\\dot x\end{bmatrix}$$Our job as Kalman filters designers is to specify $\mathbf F$ such that $\bar{\mathbf x} = \mathbf{Fx}$ performs the innovation (prediction) for our system. To do this we need one equation for each state variable. In our problem $\mathbf x = \begin{bmatrix}x & \dot x\end{bmatrix}^\mathtt{T}$, so we need one equation to compute the position $x$ and another to compute the velocity $\dot x$ . We already know the equation for the position innovation:
$$\bar x = x + \dot x \Delta t$$What is our equation for velocity? We have no predictive model for how our dog's velocity will change over time. In this case we assume that it remains constant between innovations. Of course this is not exactly true, but so long as the velocity doesn't change too much over each innovation you will see that the filter performs very well. So we say
$$\bar{\dot x} = \dot x$$This gives us the process model for our system
$$\begin{cases} \begin{aligned} \bar x &= x + \dot x \Delta t \\ \bar{\dot x} &= \dot x \end{aligned} \end{cases}$$This correctly has one equation for each variable in the state, isolated on the left hand side. We need to express this set of equations in the form $\bar{\mathbf x} = \mathbf{Fx}$. Rearranging terms makes it easier to see what to do.
$$\begin{cases} \begin{aligned} \bar x &= 1x + &\Delta t\, \dot x \\ \bar{\dot x} &=0x + &1\, \dot x \end{aligned} \end{cases}$$We can rewrite this in matrix form as
$$\begin{aligned} \begin{bmatrix}\bar x \\ \bar{\dot x}\end{bmatrix} &= \begin{bmatrix}1&\Delta t \\ 0&1\end{bmatrix} \begin{bmatrix}x \\ \dot x\end{bmatrix}\\ \mathbf{\bar x} &= \mathbf{Fx} \end{aligned}$$$\mathbf F$ is called the state transition function or the state transition matrix. In later chapters it will be a true function, not a matrix, so calling it a function is a bit more general.
dt = 0.1
F = np.array([[1, dt],
[0, 1]])
F
array([[1. , 0.1], [0. , 1. ]])
Let's test this! FilterPy has a predict
method that performs the prediction by computing $\mathbf{\bar x} = \mathbf{Fx}$. Let's call it and see what happens. We've set the position to 10.0 and the velocity to 4.5 meter/sec. We've defined dt = 0.1
, which means the time step is 0.1 seconds, so we expect the new position to be 10.45 meters after the innovation. The velocity should be unchanged.
from filterpy.kalman import predict
x = np.array([10.0, 4.5])
P = np.diag([500, 49])
F = np.array([[1, dt], [0, 1]])
# Q is the process noise
x, P = predict(x=x, P=P, F=F, Q=0)
print('x =', x)
x = [10.45 4.5 ]
This worked. If we call predict()
several times in a row the value will be updated each time.
for _ in range(4):
x, P = predict(x=x, P=P, F=F, Q=0)
print('x =', x)
x = [10.9 4.5] x = [11.35 4.5 ] x = [11.8 4.5] x = [12.25 4.5 ]
predict()
computes both the mean and covariance of the innovation. This is the value of $\mathbf P$ after five innovations (predictions), which we denote $\mathbf{\bar P}$ in the Kalman filter equations.
print(P)
[[512.25 24.5 ] [ 24.5 49. ]]
Inspecting the diagonals shows us that the position variance got larger. We've performed five prediction steps with no measurements, and our uncertainty grew. The off-diagonal elements became non-zero - the Kalman filter detected a correlation between position and velocity! The variance of the velocity did not change.
Here I plot the covariance before and after the prediction. The initial value is in solid red, and the prior (prediction) is in dashed black. I've altered the covariance and time step to better illustrate the change.
from filterpy.stats import plot_covariance_ellipse
dt = 0.3
F = np.array([[1, dt], [0, 1]])
x = np.array([10.0, 4.5])
P = np.diag([500, 500])
plot_covariance_ellipse(x, P, edgecolor='r')
x, P = predict(x, P, F, Q=0)
plot_covariance_ellipse(x, P, edgecolor='k', ls='dashed')
You can see that the center of the ellipse shifted by a small amount (from 10 to 11.35) because the position changed. The ellipse also elongated, showing the correlation between position and velocity. How does the filter compute new values for $\mathbf{\bar P}$, and what is it based on? Note that I set the process noise Q
to zero each time, so it is not due to me adding noise. It's a little to early to discuss this, but recall that in every filter so far the predict step entailed a loss of information. The same is true here. I will give you the details once we have covered a bit more ground.
A quick review on process noise. A car is driving along the road with the cruise control on; it should travel at a constant speed. We model this with $\bar x_k=\dot x_k\Delta t + x_{k-1}$. However, it is affected by a number of unknown factors. The cruise control cannot perfectly maintain a constant velocity. Winds affect the car, as do hills and potholes. Passengers roll down windows, changing the drag profile of the car.
We can model this system with the differential equation
$$\dot{\mathbf x} = f(\mathbf x) + w$$where $f(\mathbf x)$ models the state transition and $w$ is white process noise.
We will learn how to go from a set of differential equations to the Kalman filter matrices in the Kalman Filter Math chapter. In this chapter we take advantage of the fact that Newton already derived the equations of motion for us. For now you just need to know that we account for the noise in the system by adding a process noise covariance matrix $\mathbf Q$ to the covariance $\mathbf P$. We do not add anything to $\mathbf x$ because the noise is white - which means that the mean of the noise will be 0. If the mean is 0, $\mathbf x$ will not change.
The univariate Kalman filter used variance = variance + process_noise
to compute the variance for the variance of the prediction step. The multivariate Kalman filter does the same, essentially P = P + Q
. I say 'essentially' because there are other terms unrelated to noise in the covariance equation that we will see later.
Deriving the process noise matrix can be quite demanding, and we will put it off until the Kalman math chapter. For now know that $\mathbf Q$ equals the expected value of the white noise $w$, computed as $\mathbf Q = \mathbb E[\mathbf{ww}^\mathsf T]$. In this chapter we will focus on building an intuitive understanding on how modifying this matrix alters the behavior of the filter.
FilterPy provides functions which compute $\mathbf Q$ for the kinematic problems of this chapter. Q_discrete_white_noise
takes 3 parameters. dim
, which specifies the dimension of the matrix, dt
, which is the time step in seconds, and var
, the variance in the noise. Briefly, it discretizes the noise over the given time period under assumptions that we will discuss later. This code computes $\mathbf Q$ for white noise with a variance of 2.35 and a time step of 1 seconds:
from filterpy.common import Q_discrete_white_noise
Q = Q_discrete_white_noise(dim=2, dt=1., var=2.35)
print(Q)
[[0.588 1.175] [1.175 2.35 ]]
The Kalman filter does not just filter data, it allows us to incorporate the control inputs of systems like robots and airplanes. Suppose we are controlling a robot. At each time step we would send steering and velocity signals to the robot based on its current position vs desired position. Kalman filter equations incorporate that knowledge into the filter equations, creating a predicted position based both on current velocity and control inputs to the drive motors. Remember, we never throw information away.
For a linear system the effect of control inputs can be described as a set of linear equations, which we can express with linear algebra as
$$\Delta\mathbf x = \mathbf{Bu}$$Here $\mathbf u$ is the control input, and $\mathbf B$ is the control input model or control function. For example, $\mathbf u$ might be a voltage controlling how fast the wheel's motor turns, and multiplying by $\mathbf B$ yields $\Delta[\begin{smallmatrix}x\\\dot x\end{smallmatrix}]$. In other words, it must compute how much $\mathbf x$ changes due to the control input.
Therefore the complete Kalman filter equation for the prior mean is
$$\mathbf{\bar x} = \mathbf{Fx} + \mathbf{Bu}$$and this is the equation that is computed when you call KalmanFilter.predict()
.
Your dog may be trained to respond to voice commands. All available evidence suggests that my dog has no control inputs whatsoever, so I set $\mathbf B$ to zero. In Python we write:
B = 0. # my dog doesn't listen to me!
u = 0
x, P = predict(x, P, F, Q, B, u)
print('x =', x)
print('P =', P)
x = [12.7 4.5] P = [[680.587 301.175] [301.175 502.35 ]]
Setting $\mathbf B$ and $\mathbf u$ to zero is not necessary since predict
uses 0 for their default value:
predict(x, P, F, Q)[0] == predict(x, P, F, Q, B, u)[0]
array([ True, True])
predict(x, P, F, Q)[1] == predict(x, P, F, Q, B, u)[1]
array([[ True, True], [ True, True]])
Your job as a designer is to specify the matrices for
Now we can implement the update step of the filter. You only have to supply two more matrices, and they are easy to understand.
The Kalman filter computes the update step in what we call measurement space. We mostly ignored this issue in the univariate chapter because of the complication it adds. We tracked our dog's position using a sensor that reported his position. Computing the residual was easy - subtract the filter's predicted position from the measurement:
$$ \mathtt{residual} = \mathtt{measured\, \, position} - \mathtt{predicted\, \, position}$$We need to compute the residual because we scale it by the Kalman gain to get the new estimate.
What would happen if we were trying to track temperature using a thermometer that outputs a voltage corresponding to the temperature reading? The equation for the residual computation would be meaningless; you can't subtract a temperature from a voltage.
$$ \mathtt{residual} = \mathtt{voltage} - \mathtt{temperature}\;\;\;(NONSENSE!)$$We need to convert the temperature into a voltage so we can perform the subtraction. For the thermometer we might write:
CELSIUS_TO_VOLTS = 0.21475
residual = voltage - (CELSIUS_TO_VOLTS * predicted_temperature)
The Kalman filter generalizes this problem by having you supply a measurement function that converts a state into a measurement.
Why are we working in measurement space? Why not work in state space by converting the voltage into a temperature, allowing the residual to be a difference in temperature?
We cannot do that because most measurements are not invertible. The state for the tracking problem contains the hidden variable $\dot x$. There is no way to convert a measurement of position into a state containing velocity. On the other hand, it is trivial to convert a state containing position and velocity into a equivalent "measurement" containing only position. We have to work in measurement space to make the computation of the residual possible.
Both the measurement $\mathbf z$ and state $\mathbf x$ are vectors so we need to use a matrix to perform the conversion. The Kalman filter equation that performs this step is:
$$\mathbf y = \mathbf z - \mathbf{H \bar x}$$where $\mathbf y$ is the residual, $\mathbf{\bar x}$ is the prior, $\mathbf z$ is the measurement, and $\mathbf H$ is the measurement function. So we take the prior, convert it to a measurement by multiplying it with $\mathbf H$, and subtract that from the measurement. This gives us the difference between our prediction and measurement in measurement space!
We need to design $\mathbf H$ so that $\mathbf{H\bar x}$ yields a measurement. For this problem we have a sensor that measures position, so $\mathbf z$ will be a one variable vector:
$$\mathbf z = \begin{bmatrix}z\end{bmatrix}$$The residual equation will have the form
$$ \begin{aligned} \textbf{y} &= \mathbf z - \mathbf{H\bar x} \\ \begin{bmatrix}y \end{bmatrix} &= \begin{bmatrix}z\end{bmatrix} - \begin{bmatrix}?&?\end{bmatrix} \begin{bmatrix}x \\ \dot x\end{bmatrix} \end{aligned} $$$\mathbf H$ has to be a 1x2 matrix for $\mathbf{Hx}$ to be 1x1. Recall that multiplying matrices $m\times n$ by $n\times p$ yields a $m\times p$ matrix.
We will want to multiply the position $x$ by 1 to get the corresponding measurement of the position. We do not need to use velocity to find the corresponding measurement so we multiply $\dot x$ by 0.
$$\begin{aligned} \textbf{y} &= \mathbf z - \begin{bmatrix}1&0\end{bmatrix} \begin{bmatrix}x \\ \dot x\end{bmatrix} \\ &= [z] - [x] \end{aligned}$$And so, for our Kalman filter we set
$$\mathbf H=\begin{bmatrix}1&0\end{bmatrix}$$H = np.array([[1., 0.]])
We have designed the majority of our Kalman filter. All that is left is to model the noise in the sensors.
The measurement is implemented with $\mathbf z$, the measurement mean, and $\mathbf R$, the measurement covariance.
$\mathbf z$ is easy. it contains the measurement(s) as a vector. We have only one measurement, so we have:
$$\mathbf z = \begin{bmatrix}z\end{bmatrix}$$If we have two sensors or measurements we'd have:
$$\mathbf z = \begin{bmatrix}z_1 \\ z_2\end{bmatrix}$$The measurement noise matrix models the noise in our sensors as a covariance matrix. In practice this can be difficult. A complicated system may have many sensors, the correlation between them might not be clear, and usually their noise is not a pure Gaussian. For example, a sensor might be biased to read high if the temperature is high, and so the noise is not distributed equally on both sides of the mean. We will learn to deal with these problems later.
The Kalman filter equations uses a covariance matrix $\mathbf R$ for the measurement noise. The matrix will have dimension $m{\times}m$, where $m$ is the number of sensors. It is a covariance matrix to account for correlations between the sensors. We have only 1 sensor so $\mathbf R$ is:
$$\mathbf R = \begin{bmatrix}\sigma^2_z\end{bmatrix}$$If $\sigma^2_z$ is 5 meters squared we'd have $\mathbf R = \begin{bmatrix}5\end{bmatrix}$.
If we had two position sensors, the first with a variance of 5 m$^2$, the second with a variance of 3 m$^2$, we would write
$$\mathbf R = \begin{bmatrix}5&0\\0&3\end{bmatrix}$$We put the variances on the diagonal because this is a covariance matrix, where the variances lie on the diagonal, and the covariances, if any, lie in the off-diagonal elements. Here we assume there is no correlation in the noise between the two sensors, so the covariances are 0.
For our problem we only have one sensor, so we can implement this as
R = np.array([[5.]])
We perform the update by calling update
.
from filterpy.kalman import update
z = 1.
x, P = update(x, P, z, R, H)
print('x =', x)
x = [ 1.085 -0.64 ]
Keeping track of all of these variables is burdensome, so FilterPy also implements the filter with the class KalmanFilter
. I will use the class in the rest of this book, but I wanted you to see the procedural form of these functions since I know some of you are not fans of object oriented programming.
I've given you all of the code for the filter, but now let's collect it in one place. First we construct a KalmanFilter
object. We have to specify the number of variables in the state with the dim_x
parameter, and the number of measurements with dim_z
. We have two random variables in the state and one measurement, so we write:
from filterpy.kalman import KalmanFilter
dog_filter = KalmanFilter(dim_x=2, dim_z=1)
This creates an object with default values for all the Kalman filter matrices:
from filterpy.kalman import KalmanFilter
dog_filter = KalmanFilter(dim_x=2, dim_z=1)
print('x = ', dog_filter.x.T)
print('R = ', dog_filter.R)
print('Q = \n', dog_filter.Q)
# etc...
x = [[0. 0.]] R = [[1.]] Q = [[1. 0.] [0. 1.]]
Now we initialize the filter's matrices and vectors with values valid for our problem. I've put this in a function to allow you to specify different initial values for R
, P
, and Q
and put it in a helper function. We will be creating and running many of these filters, and this saves us a lot of headaches.
from filterpy.kalman import KalmanFilter
from filterpy.common import Q_discrete_white_noise
def pos_vel_filter(x, P, R, Q=0., dt=1.0):
""" Returns a KalmanFilter which implements a
constant velocity model for a state [x dx].T
"""
kf = KalmanFilter(dim_x=2, dim_z=1)
kf.x = np.array([x[0], x[1]]) # location and velocity
kf.F = np.array([[1., dt],
[0., 1.]]) # state transition matrix
kf.H = np.array([[1., 0]]) # Measurement function
kf.R *= R # measurement uncertainty
if np.isscalar(P):
kf.P *= P # covariance matrix
else:
kf.P[:] = P # [:] makes deep copy
if np.isscalar(Q):
kf.Q = Q_discrete_white_noise(dim=2, dt=dt, var=Q)
else:
kf.Q[:] = Q
return kf
KalmanFilter
initializes R
, P
, and Q
to the identity matrix, so kf.P *= P
is one way to quickly assign all of the diagonal elements to the same scalar value. Now we create the filter:
dt = .1
x = np.array([0., 0.])
kf = pos_vel_filter(x, P=500, R=5, Q=0.1, dt=dt)
You can inspect the current values of all attributes of the filter by entering the variable on the command line.
kf
KalmanFilter object dim_x = 2 dim_z = 1 dim_u = 0 x = [0. 0.] P = [[500. 0.] [ 0. 500.]] x_prior = [[0. 0.]].T P_prior = [[1. 0.] [0. 1.]] x_post = [[0. 0.]].T P_post = [[1. 0.] [0. 1.]] F = [[1. 0.1] [0. 1. ]] Q = [[0. 0. ] [0. 0.001]] R = [[5.]] H = [[1. 0.]] K = [[0. 0.]].T y = [[0.]] S = [[0.]] SI = [[0.]] M = [[0.]] B = None z = [[None]] log-likelihood = -708.3964185322641 likelihood = 2.2250738585072014e-308 mahalanobis = 0.0 alpha = 1.0 inv = <function inv at 0x00000268CA22DB80>
All that is left is to write the code to run the Kalman filter.
from kf_book.mkf_internal import plot_track
def run(x0=(0.,0.), P=500, R=0, Q=0, dt=1.0,
track=None, zs=None,
count=0, do_plot=True, **kwargs):
"""
track is the actual position of the dog, zs are the
corresponding measurements.
"""
# Simulate dog if no data provided.
if zs is None:
track, zs = compute_dog_data(R, Q, count)
# create the Kalman filter
kf = pos_vel_filter(x0, R=R, P=P, Q=Q, dt=dt)
# run the kalman filter and store the results
xs, cov = [], []
for z in zs:
kf.predict()
kf.update(z)
xs.append(kf.x)
cov.append(kf.P)
xs, cov = np.array(xs), np.array(cov)
if do_plot:
plot_track(xs[:, 0], track, zs, cov, **kwargs)
return xs, cov
This is the complete code for the filter, and most of it is boilerplate. I've made it flexible enough to support several uses in this chapter, so it is a bit verbose. Let's work through it line by line.
The first lines checks to see if you provided it with measurement data in zs
. If not, it creates the data using the compute_dog_data
function we wrote earlier.
The next lines uses our helper function to create a Kalman filter.
# create the Kalman filter
kf = pos_vel_filter(x0, R=R, P=P, Q=Q, dt=dt)
All we need to do is perform the update and predict steps of the Kalman filter for each measurement. The KalmanFilter
class provides the two methods update()
and predict()
for this purpose. update()
performs the measurement update step of the Kalman filter, and so it takes a variable containing the sensor measurement.
Absent the work of storing the results, the loop reads:
for z in zs:
kf.predict()
kf.update(z)
Each call to predict
and update
modifies the state variables x
and P
. Therefore, after the call to predict
, kf.x
contains the prior. After the call to update, kf.x
contains the posterior. The states and covariances are saved in xs
and cov
.
It really cannot get much simpler than that. As we tackle more complicated problems this code will remain largely the same; all of the work goes into setting up the KalmanFilter
matrices; executing the filter is trivial.
The rest of the code optionally plots the results and then returns the saved states and covariances.
Let's run it. We have 50 measurements with a noise variance of 10 and a process variance of 0.01.
P = np.diag([500., 49.])
Ms, Ps = run(count=50, R=10, Q=0.01, P=P)
There is still a lot to learn, but we have implemented a Kalman filter using the same theory and equations as published by Rudolf Kalman! Code very much like this runs inside of your GPS, airliners, robots, and so on.
The first plot plots the output of the Kalman filter against the measurements and the actual position of our dog (labelled Track). After the initial settling in period the filter should track the dog's position very closely. The yellow shaded portion between the black dotted lines shows 1 standard deviations of the filter's variance, which I explain in the next paragraph.
The next two plots show the variance of $x$ and of $\dot x$. I have plotted the diagonals of $\mathbf P$ over time. Recall that the diagonal of a covariance matrix contains the variance of each state variable. So $\mathbf P[0,0]$ is the variance of $x$, and $\mathbf P[1,1]$ is the variance of $\dot x$. You can see that we quickly converge to small variances for both.
The covariance matrix $\mathbf P$ tells us the theoretical performance of the filter assuming everything we tell it is true. Recall that the standard deviation is the square root of the variance, and that approximately 68% of a Gaussian distribution occurs within one standard deviation. If at least 68% of the filter output is within one standard deviation the filter may be performing well. In the top chart I have displayed the one standard deviation as the yellow shaded area between the two dotted lines. To my eye it looks like perhaps the filter is slightly exceeding that bounds, so the filter probably needs some tuning.
In the univariate chapter we filtered very noisy signals with much simpler code than the code above. However, realize that right now we are working with a very simple example - an object moving through 1-D space and one sensor. That is about the limit of what we can compute with the code in the last chapter. In contrast, we can implement very complicated, multidimensional filters with this code merely by altering our assignments to the filter's variables. Perhaps we want to track 100 dimensions in financial models. Or we have an aircraft with a GPS, INS, TACAN, radar altimeter, baro altimeter, and airspeed indicator, and we want to integrate all those sensors into a model that predicts position, velocity, and acceleration in 3D space. We can do that with the code in this chapter.
I want you to get a better feel for how the Gaussians change over time, so here is a 3D plot showing the Gaussians every 7th epoch (time step). Every 7th separates them enough so can see each one independently. The first Gaussian at $t=0$ is to the left.
from kf_book.book_plots import set_figsize, figsize
from kf_book.nonlinear_plots import plot_gaussians
P = np.diag([3., 1.])
np.random.seed(3)
Ms, Ps = run(count=25, R=10, Q=0.01, P=P, do_plot=False)
with figsize(x=9, y=5):
plot_gaussians(Ms[::7], Ps[::7], (-5,25), (-5, 5), 75)