from __future__ import print_function, division
%matplotlib inline
import numpy as np
import pymc3 as pm
import scipy
import seaborn as sns
import matplotlib.pyplot as plt
import thinkbayes2
import thinkplot
/home/downey/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters
I want to predict the number of goals scored in the next game, where
goals ~ Poisson(mu)
mu ~ Gamma(alpha, beta)
Suppose my posterior distribution for mu
has alpha=10
, beta=5
.
alpha = 10
beta = 5
I can draw a sample from the posterior, and it has the mean I expect, alpha/beta
iters = 100000
sample_mu = np.random.gamma(shape=alpha, scale=1/beta, size=iters)
np.mean(sample_mu)
2.0014370180768606
mu = alpha / beta
mu
2.0
I can sample from the predictive distribution by drawing one Poisson sample for each sampled value of mu
, and it has the mean I expect.
sample_pred = np.random.poisson(sample_mu)
np.mean(sample_pred)
2.00996
Now I'll try to do the same thing with pymc3.
Pretending that mu
is a known constant, I can sample from Poisson(mu)
and I get the mean I expect.
model = pm.Model()
with model:
goals = pm.Poisson('goals', mu)
sample_pred_wrong_pm = goals.random(size=iters)
np.mean(sample_pred_wrong_pm)
2.00449
And sampling from the posterior disrtribution of mu
, I get the mean I expect.
model = pm.Model()
with model:
mu = pm.Gamma('mu', alpha, beta)
sample_post_pm = mu.random(size=iters)
np.mean(sample_post_pm)
1.9981993583520818
But if I try to sample from the posterior predictive distribution (at least in the way I expected it to work), I don't get the mean I expect.
model = pm.Model()
with model:
mu = pm.Gamma('mu', alpha, beta)
goals = pm.Poisson('goals', mu)
sample_pred_pm = goals.random(size=iters)
np.mean(sample_pred_pm)
1.37646
It looks like it might be taking one sample from the Gamma distribution and using it to generate the entire sample of goals.
I suspect something is wrong with my mental model of how to specify the model in pymc3.