Copyright 2021 Allen B. Downey
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
There are two ways to think about Bayes's Theorem:
It is a divide-and conquer strategy for computing conditional probabilities. If it's hard to compute $P(A|B)$ directly, sometimes it is easier to compute the terms on the other side of the equation: $P(A)$, $P(B|A)$, and $P(B)$.
It is also a recipe for updating beliefs in the light of new data.
When we are working with the second interpretation, we often write Bayes's Theorem with different variables. Instead of $A$ and $B$, we use $H$ and $D$, where
$H$ stands for "hypothesis", and
$D$ stands for "data".
So we write Bayes's Theorem like this:
$$P(H|D) = \frac{P(H) ~ P(D|H)}{P(D)}$$In this context, each term has a name:
$P(H)$ is the prior probability of the hypothesis, which represents how confident you are that $H$ is true prior to seeing the data,
$P(D|H)$ is the likelihood of the data, which is the probability of seeing $D$ if the hypothesis is true,
$P(D)$ is the total probability of the data, that is, the chance of seeing $D$ regardless of whether $H$ is true or not.
$P(H|D)$ is the posterior probability of the hypothesis, which indicates how confident you should be that $H$ is true after taking the data into account.
An example will make all of this clearer.
Here's a problem I got from Wikipedia a long time ago, but now it's been edited away.
Suppose you have two bowls of cookies.
Bowl 1 contains 30 vanilla and 10 chocolate cookies.
Bowl 2 contains 20 vanilla and 20 chocolate cookies.
You choose one of the bowls at random and, without looking into the bowl, choose one of the cookies at random. It turns out to be a vanilla cookie.
What is the chance that you chose Bowl 1?
We'll assume that there was an equal chance of choosing either bowl and an equal chance of choosing any cookie in the bowl.
We can solve this problem using Bayes's Theorem. First, I'll define $H$ and $D$:
$H_1$ is the hypothesis that the bowl you chose is Bowl 1.
$D$ is the datum that the cookie is vanilla ("datum" is the rarely-used singular form of "data").
What we want is the posterior probability of $H_1$, which is $P(H_1|D)$. It is not obvious how to compute it directly, but if we can figure out the terms on the right-hand side of Bayes's Theorem, we can get to it indirectly.
$P(H_1)$ is the prior probability of $H_1$, which is the probability of choosing Bowl 1 before we see the data. If there was an equal chance of choosing either bowl, $P(H_1)$ is $1/2$.
$P(D|H_1)$ is the likelihood of the data, which is the chance of getting a vanilla cookie if $H_1$ is true, in other words, the chance of getting a vanilla cookie from Bowl 1, which is $30/40$ or $3/4$.
$P(D)$ is the total probability of the data, which is the chance of getting a vanilla cookie whether $H_1$ is true or not.
The prior and likelihood are relatively easy to compute.
# Solution
prior = 1/2
prior
0.5
# Solution
likelihood = 3/4
likelihood
0.75
The probability of the data is more difficult. To compute $P(D)$, I'll use the law of total probability.
Let's define $H_2$ to be the hypothesis that the bowl you chose is Bowl 2.
We know that either $H_1$ or $H_2$ is true (and not both), so we can write:
$$ P(D) = P(H_1) ~ P(D|H_1) + P(H_2) ~ P(D|H_2)$$Based on the statement of the problem, we have:
$P(H_1) = 1/2$
$P(D|H_1) = 3/4$
$P(H_2) = 1/2$
$P(D|H_2) = 1/2$
# Solution
prob_data = (1/2) * (3/4) + (1/2) * (1/2)
prob_data
0.625
Now that we have the terms on the right-hand side, we can use Bayes's Theorem to combine them.
# Solution
posterior = prior * likelihood / prob_data
posterior
0.6
The posterior probability is $0.6$, a little higher than the prior, which was $0.5$.
So the vanilla cookie makes us a little more certain that we chose Bowl 1.
Computing the total probability of the data is often the hardest part of the problem. Fortunately, there is another way to solve problems like this that makes it easier: the Bayes table. You can write a Bayes table on paper or use a spreadsheet, but in this notebook I'll use a Pandas DataFrame.
As an example, I'll use a Bayes table to solve the cookie problem. Here's an empty DataFrame with one row for each hypothesis:
table = pd.DataFrame(index=['Bowl 1', 'Bowl 2'])
Now I'll add a column to represent the priors:
table['prior'] = 1/2, 1/2
table
prior | |
---|---|
Bowl 1 | 0.5 |
Bowl 2 | 0.5 |
And a column for the likelihoods:
The chance of getting a vanilla cookie from Bowl 1 is 3/4.
The chance of getting a vanilla cookie from Bowl 2 is 1/2.
table['likelihood'] = 3/4, 1/2
table
prior | likelihood | |
---|---|---|
Bowl 1 | 0.5 | 0.75 |
Bowl 2 | 0.5 | 0.50 |
The next step is similar to what we did with Bayes's Theorem; we multiply the priors by the likelihoods:
table['unnorm'] = table['prior'] * table['likelihood']
table
prior | likelihood | unnorm | |
---|---|---|---|
Bowl 1 | 0.5 | 0.75 | 0.375 |
Bowl 2 | 0.5 | 0.50 | 0.250 |
Each value in unnorm
is the product of a prior and a likelihood.
The first element is $P(H_1) ~ P(D|H_1)$.
The second element is $P(H_2) ~ P(D|H_2)$.
According to the law of total probability, the sum of those terms is the probability of the data, $P(D)$:
$$P(D) = P(H_1) ~ P(D|H_1) + P(H_2) ~ P(D|H_2)$$So we can compute $P(D)$ by adding up the elements of unnorm
:
prob_data = table['unnorm'].sum()
prob_data
0.625
Notice that we get 5/8, which is what we got by computing $P(D)$ explicitly.
Now we divide by $P(D)$ to get the posterior probabilities:
table['posterior'] = table['unnorm'] / prob_data
table
prior | likelihood | unnorm | posterior | |
---|---|---|---|---|
Bowl 1 | 0.5 | 0.75 | 0.375 | 0.6 |
Bowl 2 | 0.5 | 0.50 | 0.250 | 0.4 |
The posterior probability for Bowl 1 is 0.6, which is what we got using Bayes's Theorem. As a bonus, we also get the posterior probability of Bowl 2, which is 0.4.
When we add up the unnormalized posteriors and divide through, we force the posteriors to add up to 1. This process is called "normalization", which is why the total probability of the data is also called the "normalizing constant"
import pandas as pd
def make_bayes_table(hypos, prior, likelihood):
"""Make a Bayes table.
hypos: sequence of hypotheses
prior: prior probabilities
likelihood: sequence of likelihoods
returns: DataFrame
"""
table = pd.DataFrame(index=hypos)
table['prior'] = prior
table['likelihood'] = likelihood
table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()
table['posterior'] = table['unnorm'] / prob_data
return table
Here's how we can use this function to solve the cookie problem.
hypos = 'Bowl 1', 'Bowl 2'
prior = 1/2, 1/2
likelihood = 3/4, 1/2
make_bayes_table(hypos, prior, likelihood)
prior | likelihood | unnorm | posterior | |
---|---|---|---|---|
Bowl 1 | 0.5 | 0.75 | 0.375 | 0.6 |
Bowl 2 | 0.5 | 0.50 | 0.250 | 0.4 |
What if we had chosen a chocolate cookie instead? Use a Bayes table to compute the posterior probability of Bowl 1.
# Solution
likelihood = 1/4, 1/2
make_bayes_table(hypos, prior, likelihood)
prior | likelihood | unnorm | posterior | |
---|---|---|---|---|
Bowl 1 | 0.5 | 0.25 | 0.125 | 0.333333 |
Bowl 2 | 0.5 | 0.50 | 0.250 | 0.666667 |
In the previous example and exercise, notice a pattern:
A vanilla cookie is more likely if we chose Bowl 1, so getting a vanilla cookie makes Bowl 1 more likely.
A chocolate cookie is less likely if we chose Bowl 1, so getting a chocolate cookie makes Bowl 1 less likely.
If data makes the probability of a hypothesis go up, we say that it is "evidence in favor" of the hypothesis.
If data makes the probability of a hypothesis go down, it is "evidence against" the hypothesis.
One nice thing about the table method is that it works with more than two hypotheses. As an example, let's do another version of the cookie problem.
Suppose you have five bowls:
Bowl 0 contains no vanilla cookies.
Bowl 1 contains 25% vanilla cookies.
Bowl 2 contains 50% vanilla cookies.
Bowl 3 contains 75% vanilla cookies.
Bowl 4 contains 100% vanilla cookies.
Now suppose we choose a bowl at random and then choose a cookie, and we get a vanilla cookie. What is the posterior probability that we chose each bowl?
# Solution
hypos = [0, 1, 2, 3, 4]
prior = 1/5, 1/5, 1/5, 1/5, 1/5
likelihood = 0, 0.25, 0.5, 0.75, 1
make_bayes_table(hypos, prior, likelihood)
prior | likelihood | unnorm | posterior | |
---|---|---|---|---|
0 | 0.2 | 0.00 | 0.00 | 0.0 |
1 | 0.2 | 0.25 | 0.05 | 0.1 |
2 | 0.2 | 0.50 | 0.10 | 0.2 |
3 | 0.2 | 0.75 | 0.15 | 0.3 |
4 | 0.2 | 1.00 | 0.20 | 0.4 |
Suppose you have two coins in a box. One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides.
You choose a coin at random and see that one of the sides is heads. Is this data evidence in favor of, or against, the hypothesis that you chose the trick coin?
See if you can figure out the answer before you read my solution. I suggest these steps:
First, state clearly what is the hypothesis and what is the data.
Then think about the prior, the likelihood of the data, and the total probability of the data.
Apply Bayes's Theorem or use a Bayes table to compute the posterior probability of the hypothesis.
Use the result to answer the question as posed.
# Solution
# * $H$ is the hypothesis that you chose the trick coin with two heads.
# * $D$ is the observation that one side of the coin is heads.
# Now let's think about the right-hand terms:
# * The prior is 1/2 because we were equally likely to choose either coin.
# * The likelihood is 1 because if we chose the the trick coin, we would necessarily see heads.
# * The total probability of the data is 3/4 because 3 of the 4 sides are heads, and we were equally likely to see any of them.
# Here's what we get when we apply Bayes's theorem:
# Solution
prior = 1/2
likelihood = 1
prob_data = 3/4
posterior = prior * likelihood / prob_data
posterior
0.6666666666666666
# Solution
# The posterior is greater than the prior, so this data is evidence
# *in favor of* the hypothesis that you chose the trick coin.
# And that makes sense, because getting heads is more likely if you
# choose the trick coin rather than the normal coin.
# Solution
# Here's a solution using a Bayes table.
# Solution
table = pd.DataFrame(index=['Trick', 'Normal'])
table['prior'] = 1/2, 1/2
table['likelihood'] = 1, 1/2
table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()
table['posterior'] = table['unnorm'] / prob_data
table
prior | likelihood | unnorm | posterior | |
---|---|---|---|---|
Trick | 0.5 | 1.0 | 0.50 | 0.666667 |
Normal | 0.5 | 0.5 | 0.25 | 0.333333 |