for variable $x$ is completely specified by its mean $\mu$ and variance $\Sigma$.
$x \sim \mathcal{N}(\mu_x,\sigma_y)$ and $y \sim \mathcal{N}(\mu_y,\sigma_y)$, what is the PDF for $z = A\cdot(x -y) + b$ ?
[Answer]
Note that you can rewrite these specifications in probabilistic notation as follows: $$\begin{align} p(x|\theta) &=\mathcal{N}(x|\theta,\sigma^2_{\epsilon}) \tag{likelihood}\\ p(\theta) &=\mathcal{N}(\theta|\mu_\theta,\sigma_\theta^2) \tag{prior} \end{align}$$
which we recognize as a Gaussian distribution.
where
$$\begin{align*} \frac{1}{\sigma_{\theta|x}^2} &= \frac{\sigma^2_{\epsilon} + \sigma_\theta^2}{\sigma^2_{\epsilon}\sigma_\theta^2} = \frac{1}{\sigma_\theta^2} + \frac{1}{\sigma^2_{\epsilon}}\\ \mu_{\theta|x} &= \sigma_{\theta|x}^2 \, \left( \frac{1}{\sigma^2_{\epsilon}}x + \frac{1}{\sigma_\theta^2} \mu_\theta \right) \end{align*}$$Let's plot the exact product of two Gaussian PDFs as well as the normalized product according to the above derivation.
using PyPlot, Distributions
d1 = Normal(0, 1) # μ=0, σ^2=1
d2 = Normal(3, 2) # μ=3, σ^2=4
# Calculate the parameters of the product d1*d2
s2_prod = (d1.σ^-2 + d2.σ^-2)^-1
m_prod = s2_prod * ((d1.σ^-2)*d1.μ + (d2.σ^-2)*d2.μ)
d_prod = Normal(m_prod, sqrt(s2_prod)) # Note that we neglect the normalization constant.
# Plot stuff
x = range(-4, stop=8, length=100)
plot(x, pdf.(d1,x), "k")
plot(x, pdf.(d2,x), "b")
plot(x, pdf.(d1,x) .* pdf.(d2,x), "r-") # Plot the exact product
plot(x, pdf.(d_prod,x), "r:") # Plot the normalized Gaussian product
legend([L"\mathcal{N}(0,1)",
L"\mathcal{N}(3,4)",
L"\mathcal{N}(0,1) \mathcal{N}(3,4)",
L"Z^{-1} \mathcal{N}(0,1) \mathcal{N}(3,4)"]);
The solid and dotted red curves are identical up to a scaling factor $Z$.
where $$\begin{align*} \Sigma_c^{-1} &= \Sigma_a^{-1} + \Sigma_b^{-1} \\ \Sigma_c^{-1} \mu_c &= \Sigma_a^{-1}\mu_a + \Sigma_b^{-1}\mu_b \end{align*}$$ and normalization constant $\alpha = \mathcal{N}(\mu_a|\, \mu_b, \Sigma_a + \Sigma_b)$.
Plot of the joint, marginal, and conditional distributions.
using PyPlot, Distributions
μ = [1.0; 2.0]
Σ = [0.3 0.7;
0.7 2.0]
joint = MvNormal(μ,Σ)
marginal_x = Normal(μ[1], sqrt(Σ[1,1]))
#Plot p(x,y)
subplot(221)
x_range = y_range = range(-2,stop=5,length=1000)
joint_pdf = [ pdf(joint, [x_range[i];y_range[j]]) for j=1:length(y_range), i=1:length(x_range)]
imshow(joint_pdf, origin="lower", extent=[x_range[1], x_range[end], y_range[1], y_range[end]])
grid(); xlabel("x"); ylabel("y"); title("p(x,y)"); tight_layout()
# Plot p(x)
subplot(222)
plot(range(-2,stop=5,length=1000), pdf.(marginal_x, range(-2,stop=5,length=1000)))
grid(); xlabel("x"); ylabel("p(x)"); title("Marginal distribution p(x)"); tight_layout()
# Plot p(y|x)
x = 0.1
conditional_y_m = μ[2]+Σ[2,1]*inv(Σ[1,1])*(x-μ[1])
conditional_y_s2 = Σ[2,2] - Σ[2,1]*inv(Σ[1,1])*Σ[1,2]
conditional_y = Normal(conditional_y_m, sqrt.(conditional_y_s2))
subplot(223)
plot(range(-2,stop=5,length=1000), pdf.(conditional_y, range(-2,stop=5,length=1000)))
grid(); xlabel("y"); ylabel("p(y|x)"); title("Conditional distribution p(y|x)"); tight_layout()
As is clear from the plots, the conditional distribution is a renormalized slice from the joint distribution.
$\longrightarrow$ Moral: For jointly Gaussian systems, we can do inference simply in one step by using the formulas for conditioning and marginalization.
Now consider the signal $x_t=\theta+\epsilon_t$, where $D_t= \left\{x_1,\ldots,x_t\right\}$ is observed sequentially (over time).
[Question]
[Answer]
with $$\begin{align*} K_t &= \frac{\sigma_{t-1}^2}{\sigma_{t-1}^2+\sigma_{\epsilon}^2} \qquad \text{(Kalman gain)}\\ \mu_t &= \mu_{t-1} + K_t \cdot (x_t-\mu_{t-1})\\ \sigma_t^2 &= \left( 1-K_t \right) \sigma_{t-1}^2 \end{align*}$$ (as before, we used the formulas for conditioning in a multivariate Gaussian system).
Let's implement the Kalman filter described above. We'll use it to recursively estimate the value of $\theta$ based on noisy observations.
using PyPlot
N = 50 # Number of observations
θ = 2.0 # True value of the variable we want to estimate
σ_ϵ2 = 0.25 # Observation noise variance
x = sqrt(σ_ϵ2) * randn(N) .+ θ # Generate N noisy observations of θ
t = 0
μ = fill!(Vector{Float64}(undef,N), NaN) # Means of p(θ|D) over time
σ_μ2 = fill!(Vector{Float64}(undef,N), NaN) # Variances of p(θ|D) over time
function performKalmanStep()
# Perform a Kalman filter step, update t, μ, σ_μ2
global t += 1
if t>1 # Use posterior from prev. step as prior
K = σ_μ2[t-1] / (σ_ϵ2 + σ_μ2[t-1]) # Kalman gain
μ[t] = μ[t-1] + K*(x[t] - μ[t-1]) # Update mean using (1)
σ_μ2[t] = σ_μ2[t-1] * (1.0-K) # Update variance using (2)
elseif t==1 # Use prior
# Prior p(θ) = N(0,1000)
K = 1000.0 / (σ_ϵ2 + 1000.0) # Kalman gain
μ[t] = 0 + K*(x[t] - 0) # Update mean using (1)
σ_μ2[t] = 1000 * (1.0-K) # Update variance using (2)
end
end
while t<N
performKalmanStep()
end
# Plot the 'true' value of θ, noisy observations x, and the recursively updated posterior p(θ|D)
t = collect(1:N)
plot(t, θ*ones(N), "k--")
plot(t, x, "rx")
plot(t, μ, "b-")
fill_between(t, μ-sqrt.(σ_μ2), μ+sqrt.(σ_μ2), color="b", alpha=0.3)
legend([L"\theta", L"x[t]", L"\mu[t]"])
xlim((1, N)); xlabel(L"t"); grid()
The shaded area represents 2 standard deviations of posterior $p(\theta|D)$. The variance of the posterior is guaranteed to decrease monotonically for the standard Kalman filter.
where $\mathrm{K}_n(z)$ is a modified Bessel function of the second kind.
We plot $p(Z)$ to give an idea of what this distribution looks like.
using PyPlot, Distributions, SpecialFunctions
X = Normal(0,1)
pdf_product_std_normals(z::Vector) = (besselk.(0, abs.(z))./π)
range1 = collect(range(-4,stop=4,length=100))
plot(range1, pdf.(X, range1))
plot(range1, pdf_product_std_normals(range1))
legend([L"p(X)=p(Y)=\mathcal{N}(0,1)", L"p(Z)"]); grid()
The success of Gaussian distributions in probabilistic modeling is large due to the following properties:
The Gaussian PDF has higher entropy than any other with the same variance. (Not discussed in this course).
Any smooth function with single rounded maximum, if raised to higher and higher powers, goes into a Gaussian function. (Not discussed).
Aside from working with Gaussians, it will be helpful for the next lessons to be familiar with some matrix calculus. We shortly recapitulate used formulas here.
open("../../styles/aipstyle.html") do f
display("text/html", read(f, String))
end