Copyright 2021 Allen B. Downey
License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In the previous lesson, we used a Bayes table to solve the cookie problem.
In this notebook we'll solve an expanded version of the cookie problem with 101 Bowls. It might seem silly, but it's not: the solution demonstrates a Bayesian way to estimate a proportion, and it applies to lots of real problems that don't involve cookies.
Then I'll introduce an alternative to the Bayes table, a probability mass function (PMF), which is a useful way to represent and do computations with distributions.
Here's the function, from the previous notebook, we'll use to make Bayes tables:
import pandas as pd
def make_bayes_table(hypos, prior, likelihood):
"""Make a Bayes table.
hypos: sequence of hypotheses
prior: prior probabilities
likelihood: sequence of likelihoods
returns: DataFrame
"""
table = pd.DataFrame(index=hypos)
table['prior'] = prior
table['likelihood'] = likelihood
table['unnorm'] = table['prior'] * table['likelihood']
prob_data = table['unnorm'].sum()
table['posterior'] = table['unnorm'] / prob_data
return table
The Bayes table works with more than two hypotheses. As an example, we solved a cookie problem with five bowls.
Now we'll take it even farther and solve a cookie problem with 101 bowls:
Bowl 0 contains no vanilla cookies,
Bowl 1 contains 1% vanilla cookies,
Bowl 2 contains 2% vanilla cookies,
and so on, up to
Bowl 99 contains 99% vanilla cookies, and
Bowl 100 contains all vanilla cookies.
As in the previous problems, there are only two kinds of cookies, vanilla and chocolate. So Bowl 0 is all chocolate cookies, Bowl 1 is 99% chocolate, and so on.
Suppose we choose a bowl at random, choose a cookie at random, and it turns out to be vanilla. What is the probability that the cookie came from Bowl $x$, for each value of $x$?
To solve this problem, I'll use np.arange
to represent 101 hypotheses, numbered from 0 to 100.
The prior probability for each bowl is $1/101$. I could create a sequence with 101 identical values, but if all of the priors are equal, we only have to probide one value:
Because of the way I numbered the bowls, the probability of a vanilla cookie from Bowl $x$ is $x/100$. So we can compute the likelihoods like this:
And that's all we need; the Bayes table does the rest:
Here's a feature we have not seen before: we can give the index of the Bayes table a name, which will appear when we display the table.
table.index.name = 'Bowl'
Here are the first few rows:
table.head()
Because Bowl 0 contains no vanilla cookies, its likelihood is 0, so its posterior probability is 0. That is, the cookie cannot have come from Bowl 0.
Here are the last few rows of the table.
table.tail()
The posterior probabilities are substantially higher for the high-numbered bowls.
There is a pattern here that will be clearer if we plot the results.
import matplotlib.pyplot as plt
def plot_table(table):
"""Plot results from the 101 Bowls problem.
table: DataFrame representing a Bayes table
"""
table['prior'].plot()
table['posterior'].plot()
plt.xlabel('Bowl #')
plt.ylabel('Probability')
plt.legend()
plot_table(table)
plt.title('One cookie');
The prior probabilities are uniform; that is, they are the same for every bowl.
The posterior probabilities increase linearly; Bowl 0 is the least likely (actually impossible), and Bowl 100 is the most likely.
Suppose we put the first cookie back, stir the bowl thoroughly, and draw another cookie from the same bowl. And suppose it turns out to be another vanilla cookie. Now what is the probability that we are drawing from Bowl $x$?
To answer this question, we can use the posterior probabilities from the previous problem as prior probabilities for a new Bayes table, and then update with the new data.
prior2 = table['posterior']
likelihood2 = likelihood
table2 = make_bayes_table(xs, prior2, likelihood2)
plot_table(table2)
plt.title('Two cookies');
The blue line shows the posterior after one cookie, which is the prior before the second cookie.
The orange line shows the posterior after two cookies, which curves upward. Having see two vanilla cookies, the high-numbered bowls are more likely; the low-numbered bowls are less likely.
I bet you can guess what's coming next.
Suppose we put the cookie back, stir, draw another cookie from the same bowl, and get a chocolate cookie. What do you think the posterior distribution looks like after these three cookies?
Hint: what's the probability that the chocolate cookie came from Bowl 100?
Compute the posterior distribution after two vanilla cookies and one chocolate cookie.
The posterior distribution has a peak near 60%. We can use idxmax
to find it:
The peak in the posterior distribution is at 67%.
This value has a name; it is the MAP, which stands for "Maximum Aposteori Probability" ("aposteori" is Latin for posterior).
In this example, the MAP is close to the proportion of vanilla cookies in the dataset: 2/3.
When we do more than one update, we don't always want to keep the whole Bayes table. In this section we'll replace the Bayes table with a more compact representation, a probability mass function, or PMF.
A PMF is a set of possible outcomes and their corresponding probabilities. There are many ways to represent a PMF; in this notebook I'll use a Pandas Series.
Here's a function that takes a sequence of quantities, qs
, and a sequence of probabilities, ps
, and returns a Pandas Series that represents a PMF.
def make_pmf(qs, ps, **options):
"""Make a Series that represents a PMF.
qs: sequence of quantities
ps: sequence of probabilities
options: keyword arguments passed to Series constructor
returns: Pandas Series
"""
pmf = pd.Series(ps, index=qs, **options)
return pmf
And here's a PMF that represents the prior from the 101 Bowls problem.
xs = np.arange(101)
prior = 1/101
pmf = make_pmf(xs, prior)
pmf.head()
Now that we have a prior, we need to compute likelihoods.
Here are the likelihoods for a vanilla cookie:
likelihood_vanilla = xs / 100
And for a chocolate cookie.
likelihood_chocolate = 1 - xs / 100
To compute posterior probabilities, I'll use the following function, which takes a PMF and a sequence of likelihoods, and updates the PMF:
def bayes_update(pmf, likelihood):
"""Do a Bayesian update.
pmf: Series that represents the prior
likelihood: sequence of likelihoods
"""
pmf *= likelihood
pmf /= pmf.sum()
The steps here are the same as in the Bayes table:
Multiply the prior by the likelihoods.
Add up the products to get the total probability of the data.
Divide through to normalize the posteriors.
Now we can do the update for a vanilla cookie.
bayes_update(pmf, likelihood_vanilla)
Here's what the PMF looks like after the update.
pmf.plot()
plt.xlabel('Bowl #')
plt.ylabel('Probability')
plt.title('One cookie');
That's consistent with what we got with the Bayes table.
The advantage of using a PMF is that it is easier to do multiple updates. The following cell starts again with the uniform prior and does updates with two vanilla cookies and one chocolate cookie:
Here's what the results look like:
pmf.plot()
plt.xlabel('Bowl #')
plt.ylabel('Probability')
plt.title('Three cookies');
Again, that's consistent with what we got with the Bayes table.
In this notebook, we extended the cookie problem with more bowls.
I defined the MAP, which is the quantity in a posterior distribution with the highest probability.
Although the cookie problem is not particularly realistic or useful, the method we used to solve it applies to many problems in the real world where we want to estimate a proportion.
In the next lesson we'll use the same method on a related problem, and take another step toward doing Bayesian statistics.
Suppose we take away bowls 51 through 101, leaving only the first 51 bowls.
You choose one of the remaining bowls, draw three times (replacing and stirring after each draw), and get a vanilla cookie twice.
What is the probability you drew from each bowl?
Make a Series
that represents the prior probabilities of the bowls. Update it based on the data and plot the posterior distribution. Compute the MAP.
In this example, does the MAP equal the observed proportion?