Probabilistic derivation of the Kalman filter

This derivation of the filter is based on the explanation by Pradu.

Our goal is to express the mean and covariance of the state vector $\mathbf{x}_t\sim \mathcal{N}(\mathbf{\hat{x}}_t,\mathbf{P}_t)$ that needs to be estimated, conditioned on the measurement vector $\mathbf{y}_t=\mathbf{m}$, i.e mean $\mathbf{x}_{t|\mathbf{y}_t=\mathbf{m}}$ and covariance $\mathbf{P}_{t|\mathbf{y}_t=\mathbf{m}}$.

System description

The state transition equation is

$$ \mathbf{x}_{t+1} = \mathbf{A}_t\mathbf{x}_t + \mathbf{B}_t\mathbf{u}_t + \mathbf{q}_t, $$

where $\mathbf{A}_t$ is a state transition matrix, $\mathbf{u}_t \sim \mathcal{N}(\mathbf{\hat{u}}_t,\mathbf{U}_t)$ is an input vector, and $\mathbf{q}_t \sim \mathcal{N}(\mathbf{0},\mathbf{Q}_t)$ is a noise. The measurement equation is

$$ \mathbf{y}_{t} = \mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t, $$

where $\mathbf{y}_{t}$ is the measurement, $\mathbf{C}_t$ is the output matrix, $\mathbf{D}_t$ is the feed-forward matrix, and $\mathbf{r}_t \sim \mathcal{N}(\mathbf{0},\mathbf{R}_t)$. It is assumed that $\mathbf{q}_t$, $\mathbf{u}_t$, $\mathbf{x}_t$, and $\mathbf{r}_t$ are uncorrelated.

State propagation update

The mean and covariance of the a priori state estimate (The a priori state estimate is the state estimate before any measurements are taken into account.) follows directly from the mean and covariance of a linear combination of random variables (explained below),

$$ \mathbf{\hat{x}}_{t} = \mathbf{A}_{t-1} \mathbf{\hat{x}}_{t-1} + \mathbf{B}_{t-1} \mathbf{\hat{u}}_{t-1} \\ \mathbf{P}_t = \mathbf{A}_{t-1} \mathbf{P}_{t-1} { \mathbf{A}_{t-1} }^T + \mathbf{B}_{t-1} \mathbf{U}_{t-1} {\mathbf{B}_{t-1}}^T + \mathbf{Q}_{t-1} $$

Mean and Covariance of a Linear Combination of Random Variables

Let $\mathbf{y} = \mathbf{A}\mathbf{x}_1 + \mathbf{B}\mathbf{x}_2$ where $\mathbf{x}_1$ and $\mathbf{x}_2$ are multivariate random variables. The expected value of $\mathbf{y}$ is

$$ \mathbb{E}\left[{\mathbf{y}}\right] = \mathbb{E}\left[{ \mathbf{A}\mathbf{x}_1 + \mathbf{B}\mathbf{x}_2 }\right] \\ = \mathbf{A}\mathbb{E}\left[{\mathbf{x}_1}\right] + \mathbf{B}\mathbb{E}\left[{\mathbf{x}_2}\right]. $$

The covariance matrix for $\mathbf{y}$ is

$$ \sigma(\mathbf{y},\mathbf{y}) = \mathbb{E}\left[{ ( \mathbf{y} - \mathbb{E}\left[{\mathbf{y}}\right] ) {( \mathbf{y} - \mathbb{E}\left[{\mathbf{y}}\right] )}^T }\right] \\ = \mathbb{E}\left[{ ( \mathbf{A}(\mathbf{x}_1 -\mathbb{E}\left[{ \mathbf{x}_1}\right]) + \mathbf{B}(\mathbf{x}_2 -\mathbb{E}\left[{ \mathbf{x}_2}\right]) ) {( \mathbf{A}(\mathbf{x}_1 -\mathbb{E}\left[{ \mathbf{x}_1}\right]) + \mathbf{B}(\mathbf{x}_2 -\mathbb{E}\left[{ \mathbf{x}_2}\right]) )}^T }\right] \\ = \mathbf{A} \sigma(\mathbf{x}_1,\mathbf{x}_1){\mathbf{A}}^T + \mathbf{B} \sigma(\mathbf{x}_2,\mathbf{x}_2){\mathbf{B}}^T +\mathbf{A} \sigma(\mathbf{x}_1,\mathbf{x}_2){\mathbf{B}}^T + \mathbf{B} \sigma(\mathbf{x}_2,\mathbf{x}_1){\mathbf{A}}^T. $$

If $\mathbf{x}_1$ and $\mathbf{x}_2$ are independent, then

$$ \sigma(\mathbf{y},\mathbf{y}) = \mathbf{A} \sigma(\mathbf{x}_1,\mathbf{x}_1){\mathbf{A}}^T + \mathbf{B} \sigma(\mathbf{x}_2,\mathbf{x}_2){\mathbf{B}}^T. $$

The covariance matrix for vector-valued random variables is defined as: $$ \sigma(\mathbf{x},\mathbf{y}) = \mathbb{E}\left[{ ( \mathbf{x} - \mathbb{E}\left[{\mathbf{x}}\right] ) {( \mathbf{y} - \mathbb{E}\left[{\mathbf{y}}\right] )}^T }\right] \\ = \mathbb{E}\left[{\mathbf{x}{\mathbf{y}}^T}\right] - \mathbb{E}\left[{\mathbf{x}}\right]{\mathbb{E}\left[{\mathbf{y}}\right]}^T. $$

Measurement propagation update

Let the joint distribution between $\mathbf{x}_t$ and $\mathbf{y}_t$ be

$$ \left[ \begin{array}{c} \mathbf{x}_t \\ \mathbf{y}_t \end{array} \right] \sim \mathcal{N} \left( \left[ \begin{array}{c} \mathbf{\hat{x}}_t \\ \mathbf{\hat{y}}_t \end{array} \right], \left[ \begin{array}{cc} \mathbf{\Sigma}_{xx,t} & \mathbf{\Sigma}_{xy,t} \\ \mathbf{\Sigma}_{yx,t} & \mathbf{\Sigma}_{yy,t} \end{array} \right] \right) $$

The expected measurement from state $\mathbf{x}_t$, input $\mathbf{u}_t$ is

$$ \mathbf{\hat{y}}_{t} = \mathbf{C}_t\mathbf{\hat{x}}_t + \mathbf{D}_t \mathbf{\hat{u}}_t. $$

The covariance sub-matrices are $$ \mathbf{\Sigma}_{xx,t} = \mathbf{P}_t \\ \mathbf{\Sigma}_{xy,t} = \mathbb{E}\left[{\mathbf{x}_t {\mathbf{y}_t}^T}\right] - \mathbb{E}\left[{\mathbf{x}_t}\right] {\mathbb{E}\left[{\mathbf{y}_t}\right]}^T \\ = \mathbb{E}\left[{\mathbf{x}_t {\left( \mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t \right)}^T}\right] - \mathbb{E}\left[{\mathbf{x}_t}\right] {\mathbb{E}\left[{\mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t}\right]}^T \\ = \mathbf{P}_t { \mathbf{C}_t }^T \\ \mathbf{\Sigma}_{yx,t} = {\mathbf{\Sigma}_{xy,t}}^T = \mathbf{C}_t \mathbf{P}_t \\ \mathbf{\Sigma}_{yy,t} = \mathbf{C}_t\mathbf{P}_t{\mathbf{C}_t}^T + \mathbf{D}_t \mathbf{U}_t {\mathbf{D}_t}^T + \mathbf{R}_t $$

where $\mathbb{E}\left[{\mathbf{x}}\right]$ denotes expectation of $\mathbf{x}$.

We can show that indeed,

$$ \mathbb{E}\left[{\mathbf{x}_t {\left( \mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t \right)}^T}\right] - \mathbb{E}\left[{\mathbf{x}_t}\right] {\mathbb{E}\left[{\mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t}\right]}^T = \mathbf{P}_t { \mathbf{C}_t }^T $$

The expectations of $\mathbf{x}_t$, $\mathbf{u}_t$ and $\mathbf{r}_t$ are $\mathbb{E}\left[{\mathbf{x}_t}\right] = \mathbf{\hat{x}}_t$, $\mathbb{E}\left[{\mathbf{u}_t}\right] = \mathbf{\hat{u}}_t$ and $\mathbb{E}\left[{\mathbf{r}_t}\right] = 0$.

If we multiply in the first term with $\mathbf{x}_t$, we get

$$ \mathbb{E}\left[{\mathbf{x}_t {\left( \mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t \right)}^T}\right] = \mathbb{E}\left[\mathbf{x}_t {\mathbf{x}_t}^T { \mathbf{C}_t}^T + \mathbf{x}_t {\mathbf{u}_t}^T {\mathbf{D}_t}^T + \mathbf{x}_t {\mathbf{r}_t}^T\right] = \\ \mathbb{E}\left[\mathbf{x}_t {\mathbf{x}_t}^T\right]{ \mathbf{C}_t}^T + \mathbb{E}\left[\mathbf{x}_t {\mathbf{u}_t}^T\right]{\mathbf{D}_t}^T + \mathbb{E}\left[\mathbf{x}_t {\mathbf{r}_t}^T\right] $$

If we assume that $\mathbf{x}_t$, $\mathbf{u}_t$ and $\mathbf{x}_t$, $\mathbf{r}_t$ are independent random variables,

$$ \mathbb{E}\left[\mathbf{x}_t {\mathbf{x}_t}^T\right] = \sigma(\mathbf{x}_t,\mathbf{x}_t) + \mathbb{E}\left[\mathbf{x}_t\right] \mathbb{E}\left[\mathbf{x}_t\right]^T = \mathbf{P}_t + \mathbf{\hat{x}}_t {\mathbf{\hat{x}}_t}^T \\ \mathbb{E}\left[\mathbf{x}_t {\mathbf{u}_t}^T\right] = \mathbb{E}\left[\mathbf{x}_t\right] \mathbb{E}\left[\mathbf{u}_t\right]^T = \mathbf{\hat{x}}_t {\mathbf{\hat{u}}_t}^T \\ \mathbb{E}\left[\mathbf{x}_t {\mathbf{r}_t}^T\right] = \mathbb{E}\left[\mathbf{x}_t\right] \mathbb{E}\left[\mathbf{r}_t\right]^T = \mathbf{\hat{x}}_t 0 = 0 $$

hence the first term is

$$ \mathbf{P}_t {\mathbf{C}_t}^T + \mathbf{\hat{x}}_t {\mathbf{\hat{x}}_t}^T {\mathbf{C}_t}^T + \mathbf{\hat{x}}_t {\mathbf{\hat{u}}_t}^T {\mathbf{D}_t}^T .$$

The second part of the second term is

$$ {\mathbb{E}\left[{\mathbf{C}_t\mathbf{x}_t + \mathbf{D}_t \mathbf{u}_t + \mathbf{r}_t}\right]}^T = \\ \mathbb{E}\left[\mathbf{x}_t\right]^T {\mathbf{C}_t}^T + \mathbb{E}\left[\mathbf{u}_t\right]^T {\mathbf{D}_t}^T + \mathbb{E}\left[\mathbf{r}_t\right]^T = \\ {\mathbf{\hat{x}}_t}^T {\mathbf{C}_t}^T + {\mathbf{\hat{u}}_t}^T {\mathbf{D}_t}^T ,$$

therefore the second term evaluates to

$$ \mathbf{\hat{x}}_t {\mathbf{\hat{x}}_t}^T {\mathbf{C}_t}^T + \mathbf{\hat{x}}_t {\mathbf{\hat{u}}_t}^T {\mathbf{D}_t}^T . $$

If we subtract this term from the first, we get $\mathbf{P}_t {\mathbf{C}_t}^T$ as expected.

Given a measurement $\mathbf{m}$, the conditional distribution for $\mathbf{x}$ given $\mathbf{y}$ is a normal distribution with the following properties

$$ \mathbf{\hat{x}}_{t|\mathbf{y}_t=\mathbf{m}} = \mathbb{E}\left[{\mathbf{x}_t|\mathbf{y}_t=\mathbf{m}}\right] = \mathbf{\hat{x}_t} + \mathbf{\Sigma}_{xy,t} \mathbf{\Sigma}_{yy,t}^{-1}( \mathbf{m} - {\hat{\mathbf{y}}_t} ) \\ \mathbf{P}_{t|\mathbf{y}_t=\mathbf{m}} = \text{Var}(\mathbf{x}_t|\mathbf{y}_t=\mathbf{m}) = \mathbf{\Sigma}_{xx,t} - \mathbf{\Sigma}_{xy,t} \mathbf{\Sigma}_{yy,t}^{-1} \mathbf{\Sigma}_{yx,t} \\ $$

Substituting, we get the final equations of the Kalman filter $$ \mathbf{\hat{x}}_{t|\mathbf{y}_t=\mathbf{m}} = \mathbf{\hat{x}_t} + \mathbf{P}_t { \mathbf{C}_t }^T \left( \mathbf{C}_t\mathbf{P}_t{\mathbf{C}_t}^T + \mathbf{D}_t \mathbf{U}_t {\mathbf{D}_t}^T + \mathbf{R}_t\right)^{-1}( \mathbf{m} - \mathbf{C}_t\mathbf{\hat{x}}_t - \mathbf{D}_t \mathbf{\hat{u}}_t ) \\ \mathbf{P}_{t|\mathbf{y}_t=\mathbf{m}} = \mathbf{P}_t - \mathbf{P}_t { \mathbf{C}_t }^T \left( \mathbf{C}_t\mathbf{P}_t{\mathbf{C}_t}^T + \mathbf{D}_t \mathbf{U}_t {\mathbf{D}_t}^T + \mathbf{R}_t\right)^{-1} \mathbf{C}_t \mathbf{P}_t \\ $$

The only thing remained is to show the conditional expectation and covariance of multivariate normal random variables, as it was used in the above derivation.

Conditional Density of Multivariate Normal Random Variables

Let $\mathbf{x}$, $\mathbf{y}$ be jointly normal with means $\mathbb{E}\left[{\mathbf{x}}\right]=\mathbf{\mu}_{\mathbf{x}}$, $\mathbb{E}\left[{\mathbf{y}}\right] = \mathbf{\mu}_{\mathbf{y}}$ and covariance $$ \left[ \begin{array}{cc} \mathbf{\Sigma}_{xx} & \mathbf{\Sigma}_{xy} \\ \mathbf{\Sigma}_{yx} & \mathbf{\Sigma}_{yy} \end{array} \right]. $$

Let us introduce a new variable $\mathbf{z}$, which is a linear combination of variables $\mathbf{x}$ and $\mathbf{y}$. It is Gaussian, since it is a linear combination of Gaussian random variables. $$ \mathbf{z} = \mathbf{x} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y}. $$

$\mathbf{z}$ and $\mathbf{y}$ are independent because $\mathbf{z}$ and $\mathbf{y}$ are jointly normal and

$$ \sigma( \mathbf{z}, \mathbf{y} ) = \sigma( \mathbf{x} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y}, \mathbf{y} ) \\ \sigma( \mathbf{z}, \mathbf{y} ) = \mathbb{E}\left[{ \left(\mathbf{x} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y} \right) {\mathbf{y}}^T }\right] - \mathbb{E}\left[{\mathbf{x} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y} }\right] {\mathbb{E}\left[{\mathbf{y}}\right]}^T \\ \sigma( \mathbf{z}, \mathbf{y} ) = \mathbb{E}\left[{ \mathbf{x}{\mathbf{y}}^T }\right] - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbb{E}\left[{ \mathbf{y} {\mathbf{y}}^T}\right]- \mathbb{E}\left[{\mathbf{x}}\right]{\mathbb{E}\left[{\mathbf{y}}\right]}^T -\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbb{E}\left[{\mathbf{y}}\right] {\mathbb{E}\left[{\mathbf{y}}\right]}^T \\ \sigma( \mathbf{z}, \mathbf{y} ) = \mathbf{\Sigma}_{xy} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yy} = \mathbf{0}\\ $$

Let $\mathbf{t}\sim \mathcal{N}(\mathbf{\mu}_{\mathbf{t}},\mathbf{T})$. Note that because $\mathbf{z}$ and $\mathbf{y}$ are independent, $\mathbb{E}\left[{\mathbf{z}|\mathbf{y}=\mathbf{t}}\right] = \mathbb{E}\left[{\mathbf{z}}\right] = \mathbf{\mu}_{\mathbf{x}} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1}\mathbf{\mu}_{\mathbf{y}}$. The conditional expectation for $\mathbf{x}$ given $\mathbf{y}$ is

$$ \mathbb{E}\left[{\mathbf{x}|\mathbf{y}=\mathbf{t}}\right] = \mathbb{E}\left[{\mathbf{z}+\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y}|\mathbf{y}=\mathbf{t}}\right] \\ \mathbb{E}\left[{\mathbf{x}|\mathbf{y}=\mathbf{t}}\right] = \mathbb{E}\left[{\mathbf{z}|\mathbf{y}=\mathbf{t}}\right]+\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbb{E}\left[{\mathbf{y}|\mathbf{y}=\mathbf{t}}\right] \\ \mathbb{E}\left[{\mathbf{x}|\mathbf{y}=\mathbf{t}}\right] = \mathbf{\mu}_{\mathbf{x}} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1}\mathbf{\mu}_{\mathbf{y}}+\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\mu}_{\mathbf{t}} \\ \mathbb{E}\left[{\mathbf{x}|\mathbf{y}=\mathbf{t}}\right] = \mathbf{\mu}_{\mathbf{x}} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1}( \mathbf{\mu}_{\mathbf{t}} - \mathbf{\mu}_{\mathbf{y}} ) \\ $$

The conditional covariance is

$$ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \text{Var}(\mathbf{z}+\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y}|\mathbf{y}=\mathbf{t}) \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \text{Var}(\mathbf{z}|\mathbf{y}) + \text{Var}(\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y}|\mathbf{y}=\mathbf{t}) \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \text{Var}(\mathbf{z}) + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{T} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \text{Var}(\mathbf{x} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{y}) + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{T} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \mathbf{\Sigma}_{xx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yy} {(\mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1})}^T - \mathbf{\Sigma}_{xy} {(\mathbf{\Sigma}_{xy}\mathbf{\Sigma}_{yy}^{-1})}^T - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{T} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \mathbf{\Sigma}_{xx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1}\mathbf{\Sigma}_{yx} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{T} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \mathbf{\Sigma}_{xx} - \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \mathbf{T} \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \mathbf{\Sigma}_{xx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1} \left(\mathbf{T} - \mathbf{\Sigma}_{yy} \right) \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ $$

In summary,

$$ \mathbb{E}\left[{\mathbf{x}|\mathbf{y}=\mathbf{t}}\right] = \mathbf{\mu}_{\mathbf{x}} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1}( \mathbf{\mu}_{\mathbf{t}} - \mathbf{\mu}_{\mathbf{y}} ) \\ \text{Var}(\mathbf{x}|\mathbf{y}=\mathbf{t}) = \mathbf{\Sigma}_{xx} + \mathbf{\Sigma}_{xy} \mathbf{\Sigma}_{yy}^{-1}\left(\mathbf{T} - \mathbf{\Sigma}_{yy}\right) \mathbf{\Sigma}_{yy}^{-1} \mathbf{\Sigma}_{yx} \\ \mathbb{E}\left[{\mathbf{y}|\mathbf{x}=\mathbf{s}}\right] = \mathbf{\mu}_{\mathbf{y}} + \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1}( \mathbf{\mu}_{\mathbf{s}} - \mathbf{\mu}_{\mathbf{x}} ) \\ \text{Var}(\mathbf{y}|\mathbf{x}=\mathbf{s}) = \mathbf{\Sigma}_{yy} + \mathbf{\Sigma}_{yx} \mathbf{\Sigma}_{xx}^{-1}\left(\mathbf{S} - \mathbf{\Sigma}_{xx}\right) \mathbf{\Sigma}_{xx}^{-1} \mathbf{\Sigma}_{xy}. $$