Goal
Materials
Mandatory
Optional
the pre-recorded video guide and live class of 2020
Ariel Caticha - 2012 - Entropic Inference and the Foundations of Physics, pp.7-56 (ch.2: probability)
Edwin Jaynes - 2003 - PT -- The Logic of Science.
D.S. Sivia, with J. Skilling - 2006 - Data Analysis: A Bayesian Tutorial
Bishop pp. 12-24
Does this fragment resonate with your own experience?
In this lesson we introduce PT again. As we will see in the next lessons, PT is all you need to make sense of machine learning, artificial intelligence, statistics, etc.
where $p(a|b)$ means the probability that $a$ is true, given that $b$ is true.
The Bayesian interpretation contrasts with the frequentist interpretation of a probability as the relative frequency that an event would occur under repeated execution of an experiment.
and hence that $$ p(\theta|D) = \frac{p(D|\theta) }{p(D)}p(\theta)\,.$$
This formula is called Bayes rule (or Bayes theorem). While Bayes rule is always true, a particularly useful application occurs when $D$ refers to an observed data set and $\theta$ is set of model parameters. In that case,
$\Rightarrow$ Bayes rule tells us how to update our knowledge about model parameters when facing new data. Hence,
Consider the following simple model for the outcome (head or tail) $y \in \{0,1\}$ of a biased coin toss with parameter $\theta \in [0,1]$:
$$\begin{align*} p(y|\theta) &\triangleq \theta^y (1-\theta)^{1-y}\\ \end{align*}$$We can plot both the sampling distribution $p(y|\theta=0.5)$ and the likelihood function $L(\theta) = p(y=0|\theta)$.
using PyPlot
#using Plots
p(y,θ) = θ.^y .* (1 .- θ).^(1 .- y)
f = figure()
θ = 0.5 # Set parameter
# Plot the sampling distribution
subplot(221); stem([0,1], p([0,1],θ));
title("Sampling distribution");
xlim([-0.5,1.5]); ylim([0,1]); xlabel("y"); ylabel("p(y|θ=$(θ))");
subplot(222);
_θ = 0:0.01:1
y = 1.0 # Plot p(y=1 | θ)
plot(_θ,p(y,_θ))
title("Likelihood function");
xlabel("θ");
ylabel("L(θ) = p(y=$y)|θ)");
The (discrete) sampling distribution is a valid probability distribution. However, the likelihood function $L(\theta)$ clearly isn't, since $\int_0^1 L(\theta) \mathrm{d}\theta \neq 1$.
$$\begin{align*} p(\,\text{Mr.S.-killed-Mrs.S.} \;&|\; \text{he-has-her-blood-on-his-shirt}\,) \\ p(\,\text{transmitted-codeword} \;&|\;\text{received-codeword}\,) \end{align*}$$
where the 's' and 'p' above the equality sign indicate whether the sum or product rule was used.
Note that $p(\text{sick}|\text{positive test}) = 0.06$ while $p(\text{positive test} | \text{sick}) = 0.95$. This is a huge difference that is sometimes called the "medical test paradox" or the base rate fallacy.
Many people have trouble distinguishing $p(A|B)$ from $p(B|A)$ in their heads. This has led to major negative consequences. For instance, unfounded convictions in the legal arena and even lots of unfounded conclusions in the pursuit of scientific results. See Ioannidis (2005) and Clayton (2021).
and a ball is drawn out, which proves to be white. What is now the chance of drawing a white ball?
Solution: Again, use Bayes and marginalization to arrive at $p(\text{white}|\text{data})=2/3$, see the Exercises notebook.
$\Rightarrow$ Note that probabilities describe a person's state of knowledge rather than a 'property of nature'.
Solution: (a) $5/12$. (b) $5/11$, see the Exercises notebook.
$\Rightarrow$ Again, we conclude that conditional probabilities reflect implications for a state of knowledge rather than temporal causality.
Consider an arbitrary distribution $p(X)$ with mean $\mu_x$ and variance $\Sigma_x$ and the linear transformation $$Z = A X + b \,.$$
No matter the specification of $p(X)$, we can derive that (see Exercises notebook)
$X$ and $Y$, with PDF's $p_x(x)$ and $p_y(y)$. The PDF $p_z(z)$for $Z=X+Y$ is given by the convolution
$$ p_z (z) = \int_{ - \infty }^\infty {p_x (x)p_y (z - x)\,\mathrm{d}{x}} $$$ p_z (z) = \int_{ - \infty }^\infty {p_x (x)p_y (z - x)\,\mathrm{d}{x}} $
using PyPlot, Distributions
μx = 2.
σx = 1.
μy = 2.
σy = 0.5
μz = μx+μy; σz = sqrt(σx^2 + σy^2)
x = Normal(μx, σx)
y = Normal(μy, σy)
z = Normal(μz, σz)
range_min = minimum([μx-2*σx, μy-2*σy, μz-2*σz])
range_max = maximum([μx+2*σx, μy+2*σy, μz+2*σz])
range_grid = range(range_min, stop=range_max, length=100)
plot(range_grid, pdf.(x,range_grid), "k-")
plot(range_grid, pdf.(y,range_grid), "b-")
plot(range_grid, pdf.(z,range_grid), "r-")
legend([L"p_X", L"p_Y", L"p_Z"])
grid()
$X$ and $Y$, with PDF's $p_x(x)$ and $p_y(y)$, the PDF of $Z = X Y $ is given by $$ p_z(z) = \int_{-\infty}^{\infty} p_x(x) \,p_y(z/x)\, \frac{1}{|x|}\,\mathrm{d}x $$
which is also known as the Change-of-Variable theorem.
Let $p_x(x) = \mathcal{N}(x|\mu,\sigma^2)$ and $y = \frac{x-\mu}{\sigma}$.
Problem: What is $p_y(y)$?
Solution: Note that $h(x)$ is invertible with $x = g(y) = \sigma y + \mu$. The change-of-variable formula leads to
open("../../styles/aipstyle.html") do f
display("text/html", read(f,String))
end