In [1]:

from __future__ import print_function, division

%matplotlib inline

import numpy as np
import pymc3 as pm
import scipy
import seaborn as sns

import matplotlib.pyplot as plt

import thinkbayes2
import thinkplot

/home/downey/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

I want to predict the number of goals scored in the next game, where

goals ~ Poisson(mu)

mu ~ Gamma(alpha, beta)

Suppose my posterior distribution for mu has alpha=10, beta=5.

In [2]:

alpha = 10
beta = 5

I can draw a sample from the posterior, and it has the mean I expect, alpha/beta

In [3]:

iters = 100000
sample_mu = np.random.gamma(shape=alpha, scale=1/beta, size=iters)
np.mean(sample_mu)

Out[3]:

2.0014370180768606

In [4]:

mu = alpha / beta
mu

Out[4]:

2.0

I can sample from the predictive distribution by drawing one Poisson sample for each sampled value of mu, and it has the mean I expect.

In [5]:

sample_pred = np.random.poisson(sample_mu)
np.mean(sample_pred)

Out[5]:

2.00996

Now I'll try to do the same thing with pymc3.

Pretending that mu is a known constant, I can sample from Poisson(mu) and I get the mean I expect.

In [6]:

model = pm.Model()

with model:
    goals = pm.Poisson('goals', mu)
    sample_pred_wrong_pm = goals.random(size=iters)

np.mean(sample_pred_wrong_pm)

Out[6]:

2.00449

And sampling from the posterior disrtribution of mu, I get the mean I expect.

In [7]:

model = pm.Model()

with model:
    mu = pm.Gamma('mu', alpha, beta)
    sample_post_pm = mu.random(size=iters)

np.mean(sample_post_pm)

Out[7]:

1.9981993583520818

But if I try to sample from the posterior predictive distribution (at least in the way I expected it to work), I don't get the mean I expect.

In [8]:

model = pm.Model()

with model:
    mu = pm.Gamma('mu', alpha, beta)
    goals = pm.Poisson('goals', mu)
    sample_pred_pm = goals.random(size=iters)

np.mean(sample_pred_pm)

Out[8]:

1.37646

It looks like it might be taking one sample from the Gamma distribution and using it to generate the entire sample of goals.

I suspect something is wrong with my mental model of how to specify the model in pymc3.

In [ ]: