$$p(\,\text{future-observations}\,|\,\text{past-observations}\,)$$ - Classify a received data point $$p(\,x\text{-belongs-to-class-}k \,|\,x\,)$$
and hence that $$ p(\theta|D) = \frac{p(D|\theta) p(\theta)}{p(D)}\,.$$
This formula is called Bayes rule (or Bayes theorem). While Bayes rule is always true, a particularly useful application occurs when $D$ refers to an observed data set and $\theta$ is set of model parameters that relates to the data. In that case,
$\Rightarrow$ Bayes rule tells us how to update our knowledge about model parameters when facing new data. Hence,
$\Longrightarrow$ All that we can learn from the observed data is contained in the likelihood function $p(D|\theta)$. This is called the likelihood principle.
Consider the following simple model for the outcome (head or tail) of a biased coin toss with parameter $\theta \in [0,1]$:
$$\begin{align*} y &\in \{0,1\} \\ p(y|\theta) &\triangleq \theta^y (1-\theta)^{1-y}\\ \end{align*}$$We can plot both the sampling distribution (i.e. $p(y|\theta=0.8)$) and the likelihood function (i.e. $L(\theta) = p(y=0|\theta)$).
using Reactive, Interact, PyPlot
p(y,θ) = θ.^y .* (1 .- θ).^(1 .- y)
f = figure()
@manipulate for y=false, θ=0:0.1:1; withfig(f) do
# Plot the sampling distribution
subplot(221); stem([0,1], p([0,1],θ));
title("Sampling distribution");
xlim([-0.5,1.5]); ylim([0,1]); xlabel("y"); ylabel("p(y|θ=$(θ))");
# Plot the likelihood function
_θ = range(0.0, stop=1.0, length=100)
subplot(222); plot(_θ, p(convert(Float64,y), _θ));
title("Likelihood function");
xlabel("θ");
ylabel("L(θ) = p(y=$(convert(Float64,y))|θ)");
end
end
The (discrete) sampling distribution is a valid probability distribution. However, the likelihood function $L(\theta)$ clearly isn't, since $\int_0^1 L(\theta) \mathrm{d}\theta \neq 1$.
$$\begin{align*} p(\,\text{Mr.S.-killed-Mrs.S.} \;&|\; \text{he-has-her-blood-on-his-shirt}\,) \\ p(\,\text{transmitted-codeword} \;&|\;\text{received-codeword}\,) \end{align*}$$
where the 's' and 'p' above the equality sign indicate whether the sum or product rule was used.
and a ball is drawn out, which proves to be white. What is now the chance of drawing a white ball?
[Answer] - Again, use Bayes and marginalization to arrive at $p(\text{white}|\text{data})=2/3$, see homework exercise
$\Rightarrow$ Note that probabilities describe a person's state of knowledge rather than a 'property of nature'.
[Answer] - (a) $5/12$. (b) $5/11$, see homework.
$\Rightarrow$ Again, we conclude that conditional probabilities reflect implications for a state of knowledge rather than temporal causality.
$X$ and $Y$, with PDF's $p_x(x)$ and $p_y(y)$. What is the PDF of $$Z = X + Y\;?$$
using Reactive, Interact, PyPlot, Distributions
f = figure()
@manipulate for μx=0:0.1:4, σx=0.1:0.1:1.9,μy=0:0.1:4, σy=0.1:0.1:0.9; withfig(f) do
μz = μx+μy; σz = sqrt(σx^2 + σy^2)
x = Normal(μx, σx)
y = Normal(μy, σy)
z = Normal(μz, σz)
range_min = minimum([μx-2*σx, μy-2*σy, μz-2*σz])
range_max = maximum([μx+2*σx, μy+2*σy, μz+2*σz])
range_grid = range(range_min, stop=range_max, length=100)
plot(range_grid, pdf.(x,range_grid), "k-")
plot(range_grid, pdf.(y,range_grid), "b-")
plot(range_grid, pdf.(z,range_grid), "r-")
legend([L"p_X", L"p_Y", L"p_Z"])
grid()
end
end
$X$ and $Y$, with PDF's $p_x(x)$ and $p_y(y)$. What is the PDF of $$Z = X Y \,\text{?}$$
- For proof, see [https://en.wikipedia.org/wiki/Product_distribution](https://en.wikipedia.org/wiki/Product_distribution)
No matter how $x$ is distributed, we can easily derive that (do as exercise)
$$\begin{align} \mathrm{E}[Ax +b] &= A\mathrm{E}[x] + b \tag{SRG-3a}\\ \mathrm{cov}[Ax +b] &= A\,\mathrm{cov}[x]\,A^T \tag{SRG-3b} \end{align}$$open("../../styles/aipstyle.html") do f
display("text/html", read(f,String))
end