Consider a set of observations $D=\{x_1,…,x_N\}$ in the 2-dimensional plane (see Figure). All observations were generated by the same process. We now draw an extra observation $x_\bullet = (a,b)$ from the same data generating process. What is the probability that $x_\bullet$ lies within the shaded rectangle $S$?
using Pkg;Pkg.activate("probprog/workspace/");Pkg.instantiate();
IJulia.clear_output();
using Distributions, PyPlot
N = 100
generative_dist = MvNormal([0,1.], [0.8 0.5; 0.5 1.0])
function plotObservations(obs::Matrix)
plot(obs[1,:], obs[2,:], "kx", zorder=3)
fill_between([0., 2.], 1., 2., color="k", alpha=0.4, zorder=2) # Shaded area
text(2.05, 1.8, "S", fontsize=12)
xlim([-3,3]); ylim([-2,4]); xlabel("a"); ylabel("b")
end
D = rand(generative_dist, N) # Generate observations from generative_dist
plotObservations(D)
x_dot = rand(generative_dist) # Generate x∙
plot(x_dot[1], x_dot[2], "ro");
The sum of two independent Gaussian variables is also Gaussian distributed. Specifically, if $x \sim \mathcal{N} \left(\mu_x, \Sigma_x \right)$ and $y \sim \mathcal{N} \left(\mu_y, \Sigma_y \right)$, then the PDF for $z=x+y$ is given by $$\begin{align} p(z) &= \mathcal{N}(x\,|\,\mu_x,\Sigma_x) \ast \mathcal{N}(y\,|\,\mu_y,\Sigma_y) \notag\\ &= \mathcal{N} \left(z\,|\,\mu_x+\mu_y, \Sigma_x +\Sigma_y \right) \tag{SRG-8} \end{align}$$
Let's estimate a constant $\theta$ from one 'noisy' measurement $x$ about that constant.
We assume the following measurement equations (the tilde $\sim$ means: 'is distributed as'): $$\begin{align*} x &= \theta + \epsilon \\ \epsilon &\sim \mathcal{N}(0,\sigma^2) \end{align*}$$
Also, let's assume a Gaussian prior for $\theta$ $$\begin{align*} \theta &\sim \mathcal{N}(\mu_0,\sigma_0^2) \\ \end{align*}$$
using PyPlot, Distributions
d1 = Normal(0, 1) # μ=0, σ^2=1
d2 = Normal(3, 2) # μ=3, σ^2=4
# Calculate the parameters of the product d1*d2
s2_prod = (d1.σ^-2 + d2.σ^-2)^-1
m_prod = s2_prod * ((d1.σ^-2)*d1.μ + (d2.σ^-2)*d2.μ)
d_prod = Normal(m_prod, sqrt(s2_prod)) # Note that we neglect the normalization constant.
# Plot stuff
x = range(-4, stop=8, length=100)
plot(x, pdf.(d1,x), "k")
plot(x, pdf.(d2,x), "b")
plot(x, pdf.(d1,x) .* pdf.(d2,x), "r-") # Plot the exact product
plot(x, pdf.(d_prod,x), "r:") # Plot the normalized Gaussian product
legend([L"\mathcal{N}(0,1)",
L"\mathcal{N}(3,4)",
L"\mathcal{N}(0,1) \mathcal{N}(3,4)",
L"Z^{-1} \mathcal{N}(0,1) \mathcal{N}(3,4)"]);
The solid and dotted red curves are identical up to a scaling factor $Z$.
Now consider that we measure a data set $D = \{x_1, x_2, \ldots, x_N\}$, with measurements $$\begin{align*} x_n &= \theta + \epsilon_n \\ \epsilon_n &\sim \mathcal{N}(0,\sigma^2) \end{align*}$$ and the same prior for $\theta$: $$\begin{align*} \theta &\sim \mathcal{N}(\mu_0,\sigma_0^2) \\ \end{align*}$$
Let's derive a distribution for the next sample $x_{N+1}$.
using PyPlot, Distributions
μ = [1.0; 2.0]
Σ = [0.3 0.7;
0.7 2.0]
joint = MvNormal(μ,Σ)
marginal_x = Normal(μ[1], sqrt(Σ[1,1]))
#Plot p(x,y)
subplot(221)
x_range = y_range = range(-2,stop=5,length=1000)
joint_pdf = [ pdf(joint, [x_range[i];y_range[j]]) for j=1:length(y_range), i=1:length(x_range)]
imshow(joint_pdf, origin="lower", extent=[x_range[1], x_range[end], y_range[1], y_range[end]])
grid(); xlabel("x"); ylabel("y"); title("p(x,y)"); tight_layout()
# Plot p(x)
subplot(222)
plot(range(-2,stop=5,length=1000), pdf.(marginal_x, range(-2,stop=5,length=1000)))
grid(); xlabel("x"); ylabel("p(x)"); title("Marginal distribution p(x)"); tight_layout()
# Plot p(y|x)
x = 0.1
conditional_y_m = μ[2]+Σ[2,1]*inv(Σ[1,1])*(x-μ[1])
conditional_y_s2 = Σ[2,2] - Σ[2,1]*inv(Σ[1,1])*Σ[1,2]
conditional_y = Normal(conditional_y_m, sqrt.(conditional_y_s2))
subplot(223)
plot(range(-2,stop=5,length=1000), pdf.(conditional_y, range(-2,stop=5,length=1000)))
grid(); xlabel("y"); ylabel("p(y|x)"); title("Conditional distribution p(y|x)"); tight_layout()
As is clear from the plots, the conditional distribution is a renormalized slice from the joint distribution.
Consider the signal $x_t=\theta+\epsilon_t$, where $D_t= \left\{x_1,\ldots,x_t\right\}$ is observed sequentially (over time).
Problem: Derive a recursive algorithm for $p(\theta|D_t)$, i.e., an update rule for (posterior) $p(\theta|D_t)$ based on (prior) $p(\theta|D_{t-1})$ and (new observation) $x_t$.
Let's define the estimate after $t$ observations (i.e., our solution ) as $p(\theta|D_t) = \mathcal{N}(\theta\,|\,\mu_t,\sigma_t^2)$.
We define the joint distribution for $\theta$ and $x_t$, given background $D_{t-1}$, by
using PyPlot
N = 50 # Number of observations
θ = 2.0 # True value of the variable we want to estimate
σ_ϵ2 = 0.25 # Observation noise variance
x = sqrt(σ_ϵ2) * randn(N) .+ θ # Generate N noisy observations of θ
t = 0
μ = fill!(Vector{Float64}(undef,N), NaN) # Means of p(θ|D) over time
σ_μ2 = fill!(Vector{Float64}(undef,N), NaN) # Variances of p(θ|D) over time
function performKalmanStep()
# Perform a Kalman filter step, update t, μ, σ_μ2
global t += 1
if t>1 # Use posterior from prev. step as prior
K = σ_μ2[t-1] / (σ_ϵ2 + σ_μ2[t-1]) # Kalman gain
μ[t] = μ[t-1] + K*(x[t] - μ[t-1]) # Update mean using (1)
σ_μ2[t] = σ_μ2[t-1] * (1.0-K) # Update variance using (2)
elseif t==1 # Use prior
# Prior p(θ) = N(0,1000)
K = 1000.0 / (σ_ϵ2 + 1000.0) # Kalman gain
μ[t] = 0 + K*(x[t] - 0) # Update mean using (1)
σ_μ2[t] = 1000 * (1.0-K) # Update variance using (2)
end
end
while t<N
performKalmanStep()
end
# Plot the 'true' value of θ, noisy observations x, and the recursively updated posterior p(θ|D)
t = collect(1:N)
plot(t, θ*ones(N), "k--")
plot(t, x, "rx")
plot(t, μ, "b-")
fill_between(t, μ-sqrt.(σ_μ2), μ+sqrt.(σ_μ2), color="b", alpha=0.3)
legend([L"\theta", L"x[t]", L"\mu[t]"])
xlim((1, N)); xlabel(L"t"); grid()
The shaded area represents 2 standard deviations of posterior $p(\theta|D)$. The variance of the posterior is guaranteed to decrease monotonically for the standard Kalman filter.
using PyPlot, Distributions, SpecialFunctions
X = Normal(0,1)
Y = Normal(0,1)
pdf_product_std_normals(z::Vector) = (besselk.(0, abs.(z))./π)
range1 = collect(range(-4,stop=4,length=100))
plot(range1, pdf.(X, range1))
plot(range1, pdf.(X,range1).*pdf.(Y,range1))
plot(range1, pdf_product_std_normals(range1))
legend([L"p(X)=p(Y)=\mathcal{N}(0,1)", L"p(X)*p(Y)",L"p(Z=X*Y)"]); grid()
We apply maximum likelihood estimation to fit a 2-dimensional Gaussian model ($m$) to data set $D$. Next, we evaluate $p(x_\bullet \in S | m)$ by (numerical) integration of the Gaussian pdf over $S$: $p(x_\bullet \in S | m) = \int_S p(x|m) \mathrm{d}x$.
using HCubature, LinearAlgebra# Numerical integration package
# Maximum likelihood estimation of 2D Gaussian
N = length(sum(D,dims=1))
μ = 1/N * sum(D,dims=2)[:,1]
D_min_μ = D - repeat(μ, 1, N)
Σ = Hermitian(1/N * D_min_μ*D_min_μ')
m = MvNormal(μ, convert(Matrix, Σ));
# # Contour plot of estimated Gaussian density
A = Matrix{Float64}(undef,100,100); B = Matrix{Float64}(undef,100,100)
density = Matrix{Float64}(undef,100,100)
for i=1:100
for j=1:100
A[i,j] = a = (i-1)*6/100 .- 2
B[i,j] = b = (j-1)*6/100 .- 3
density[i,j] = pdf(m, [a,b])
end
end
c = contour(A, B, density, 6, zorder=1)
PyPlot.set_cmap("cool")
clabel(c, inline=1, fontsize=10)
# Plot observations, x∙, and the countours of the estimated Gausian density
plotObservations(D)
plot(x_dot[1], x_dot[2], "ro")
# Numerical integration of p(x|m) over S:
(val,err) = hcubature((x)->pdf(m,x), [0., 1.], [2., 2.])
println("p(x⋅∈S|m) ≈ $(val)")
A linear transformation of a Gaussian-distributed (potentially multivariate) variable remains Gaussian.
Bayesian inference with Gaussian prior and likelihood leads to an analytically computable Gaussian posterior.
Here's a nice summary of Gaussian calculations by Sam Roweis.
In short, if we do density estimation with a Gaussian distribution $\mathcal{N}\left(x_n\,|\,\mu,\sigma^2 \right)$ for an observed data set $D = \{x_1, x_2, \ldots, x_N\}$, the maximum likelihood estimates for $\mu$ and $\sigma^2$ are given by $$\begin{align*} \mu_{\text{ML}} &= \frac{1}{N} \sum_{n=1}^N x_n \tag{B-2.121} \\ \sigma^2_{\text{ML}} &= \frac{1}{N} \sum_{n=1}^N \left(x_n - \mu_{\text{ML}} \right)^2 \tag{B-2.122} \end{align*}$$
These estimates are also known as the sample mean and sample variance respectively.
open("../../styles/aipstyle.html") do f
display("text/html", read(f, String))
end