We will now introduce Bayesian parameter estimation and the sequential Monte Carlo algorithm as implemented by QInfer. The intent is not to have a deep understanding of the general problem of parameter estimation at the end of this, but to get a flavor of how to use what you have learned in the previous lectures to utilize a mature, but specialized Python package. Part of the goal will be for you to see analogies to these examples in the statistical problems you might be facing in your own research, and to have a new Pythonic solution. (Also, at the end you should be excited and eager to use Bayesian inference for all your statistical needs!)
In this Lecture you will learn how to:
Here we import all the necessary modules as well as make the code compatible with Python 2.7:
from __future__ import division, print_function
import qinfer as qi
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
%matplotlib inline
# Use nice plotting defaults if available.
try: plt.style.use('ggplot')
except: pass
Suppose we have a coin that we do not know the bias $p$ of. We will guess what $p$ is by flipping the coin many times.
If we knew the bias, we could determine the exact distribution of the number of "Heads" we would see. If we label the number of flips $N$ and the number of heads $n$, then the distribution of $n$ is a Binomial distribution: $$ \operatorname{Pr}(n|p) = \binom{N}{n} p^n (1-p)^{N-n}. $$
Now suppose we have flipped the coin and $n$ heads did actually occur in $N$ flips. What is the distribution of $p|n$? The answer is given by Bayes' rule: $$ \operatorname{Pr}(n|p) = \frac{\operatorname{Pr}(n|p)\operatorname{Pr}(p)}{\operatorname{Pr}(n)}, $$ where $\operatorname{Pr}(n|p)$ is called the likelihood function, $\operatorname{Pr}(p)$ the prior, $\operatorname{Pr}(p|n)$ the posterior and $\operatorname{Pr}(n)$ is called the evidence. The likelihood function we have determined above. The prior is the input to the problem and the evidence can be determined by normalization of probability.
The input, the prior, can be anything. A typical choice for generic problems is the uniform prior $\operatorname{Pr}(p) = 1/V$, where $V$ is the volume of the parameter space, in this case $V = 1$ since the bias can be between 0 and 1. For the uniform prior, the posterior is $$ \operatorname{Pr}(p|n) = \frac{p^n (1-p)^{N-n}}{B(n+1,N-n+1)}, $$ where $B$ is the Beta function. (See a derivation of a slightly more general form here).
Suppose we flip our coin 10 times. What does $\operatorname{Pr}(n|p)$ look like?
# Create a variable for the number of flips (go ahead and change this if you want).
N = 10
# This creates an array containing [0, 1, ..., N].
n = np.arange(0, N + 1)
# This sets up the figure as an matrix of subfigures with 1 row and 3 columns.
fig, axarr = plt.subplots(1, 3, figsize=(15, 5))
# This is the probability mass function (pmf) of the binomial distribution for p = 0.
dist = sp.stats.binom.pmf(n, N, 0)
# On the first (0 indexed!) subplot, make a bar plot with x-axis given by the array n and y-axis given by the pmf.
axarr[0].bar(n, dist)
# The rest is just annotation and repeating for different biases.
axarr[0].set_xlabel("$n$")
axarr[0].set_ylabel("$\operatorname{Pr}(n|p)$")
axarr[0].set_title("$p=0$")
dist = sp.stats.binom.pmf(n, N, 0.1)
axarr[1].bar(n, dist)
axarr[1].set_xlabel("$n$")
axarr[1].set_title("$p = 0.1$")
dist = sp.stats.binom.pmf(n, N, 0.5)
axarr[2].bar(n, dist)
axarr[2].set_xlabel("$n$")
axarr[2].set_title("$p = 0.5$")
<matplotlib.text.Text at 0xdf0d160>
In reality, we often have data, in this case $n$, so we are interested not in $\operatorname{Pr}(n|p)$ but in $\operatorname{Pr}(p|n)$. We still use $\operatorname{Pr}(n|p)$ in Bayes' rule but think of it as a function of $p$ for fixed data $n$. Below we plot what that looks like.
# This creates an array with 100 values between 0 and 1.
p = np.linspace(0, 1, 100)
fig, axarr = plt.subplots(1, 3, figsize=(15, 5))
likelihood = sp.stats.binom.pmf(0, N, p)
axarr[0].plot(p, likelihood, linewidth=3)
axarr[0].set_xlabel("$p$")
axarr[0].set_ylabel("$\operatorname{Pr}(n|p)$")
axarr[0].set_title("$n = 0$")
likelihood = sp.stats.binom.pmf(1, N, p)
axarr[1].plot(p, likelihood, linewidth=3)
axarr[1].set_xlabel("$p$")
axarr[1].set_title("$n = 1$")
likelihood = sp.stats.binom.pmf(5, N, p)
axarr[2].plot(p, likelihood, linewidth=3)
axarr[2].set_xlabel("$p$")
axarr[2].set_title("$n = 5$")
<matplotlib.text.Text at 0x10625470>
Bayes' rule states: $$ \operatorname{Pr}(p|n) = \frac{\operatorname{Pr}(n|p)\operatorname{Pr}(p)}{\operatorname{Pr}(n)} = \frac{p^n (1-p)^{N-n}}{B(n+1,N-n+1)}, $$ For the uniform prior $\operatorname{Pr}(p) = 1$. Since this is just proportional to $\operatorname{Pr}(p|n)$ we should expect the posterior to look the same.
(Note that $\operatorname{Pr}(p|n)$ is not strictly a probability, but a density, or measure, and can be greater than 1 for any particular value of $p$ since only questions about the probability over intervals return non-zero answers.)
# Rather than write it out many times, we define a function to compute the posterior which compresses further code.
def posterior(p, n, N):
return p ** n * (1 - p) ** (N - n) / sp.special.beta(n + 1, N - n + 1)
# Note that notebooks have access to the variables defined in previous run cells, so we do not need to redefine N and p.
fig, axarr = plt.subplots(1, 3, figsize=(15, 5))
axarr[0].plot(p, posterior(p, 0, N), linewidth=3)
axarr[0].set_xlabel("$p$")
axarr[0].set_ylabel("$\operatorname{Pr}(p|n)$")
axarr[0].set_title("$n = 0$")
axarr[1].plot(p, posterior(p, 1, N), linewidth=3)
axarr[1].set_xlabel("$p$")
axarr[1].set_title("$n = 1$")
axarr[2].plot(p, posterior(p, 5, N), linewidth=3)
axarr[2].set_xlabel("$p$")
axarr[2].set_title("$n = 5$")
<matplotlib.text.Text at 0xe7a8240>
For more complicated models, such an analytic solution is not obtainable and we must turn to numerical algorithms. The algorithm we use is called sequential Monte Carlo or the particle filter. We begin by drawing $k$ samples $\{p_i\}$ from the prior and approximating it as $$ \operatorname{Pr}(p) = \frac{1}{k}\sum_{i=1}^k \delta(p-p_i). $$ The samples are called particles. Each particle is associated a weight $w_i$ which is conventionally set to be $1/k$ initially. The weights are updated via Bayes rule as follows: $$ w_i(n) \mapsto \Pr(n|p_i)/k. $$ The set of weights $\{w_i(n)\}$ are not normalized so we make one final assignment $$ w_i(n) \mapsto \frac{w_i(n)}{\sum_j w_j(n)}. $$ Then the posterior is approximated by $$ \operatorname{Pr}(p|n) = \sum_{i=1}^k w_i(n) \delta(p-p_i). $$
That's it; that's the basics!
QInfer is numerical algorithm to approximate Bayesian inference. Beyond flipping coins, we treat Bayesian inference as the problem of inferring a set of parameters $\boldsymbol{x} \in \mathbb R^n$ from data conditioned on an experiment. That is, we want to know what $\Pr(\boldsymbol{x}|d,e)$ is. Data are generated from a distribution given by a physical model that is known: $\Pr(d|\boldsymbol{x},e)$. In order to compute this, we also need a prior $\Pr(\boldsymbol{x})$.
QInfer mirrors these objects with the following classes
QInfer implements many commonly used distributions and models used in quantum information science, including
We'll jump right in and learn how to use QInfer to solve problems in these domains.
To define the model, we start with a basis for traceless Hermitian operators $\{B_j\}_{j=1}^{d^2-1}$. In the case of a qubit, this could be the basis of Pauli matrices, for example. Then, any state $\rho$ can be written $$ \rho = \frac{I}{d}+\sum_{j=1}^{d^2-1} \theta_j B_j, $$ for some vector of parameters $\vec\theta$. These parameters must be constrained such that $\rho\geq0$. Defining a basis turns the quantum problem into an effectively classical one.
QInfer's TomographyModel
abstracts many of the implementation
details of this problem, exposing tomographic models and estimates in terms
of QuTip objects, which you will learn about next lecture. For now, the added functionality provided by QuTip will be hidden.
Tomography support in QInfer requires one of the bases mentioned above in
order to parameterize the state. Many common choices of basis are included as
TomographyBasis
objects, such as the Pauli or Gell-Mann bases.
Let us look at an example for a single qubit. Take the Pauli basis: $$ \{\sigma_x/\sqrt{2}, \sigma_y/\sqrt{2}, \sigma_z/\sqrt{2}\}. $$ As written above, any state can now be describe by 3 real numbers $(x,y,z)$, often called the Bloch vector, as follows: $$ \rho = 1/\sqrt{2}\left(\frac{I}{\sqrt{2}} + x\sigma_x +y\sigma_y + z\sigma_z\right). $$
Many of the most commonly used priors are already implemented as a Distribution
. Here we will focus on the GinibreDistribution
because is most closely resembles the "flat" prior.
basis = qi.tomography.pauli_basis(1)
prior = qi.tomography.distributions.GinibreDistribution(basis)
In doing simulations, we need to select a "true" state. We can use our prior distribution to select an unbiased "random" state for us.
true_state = prior.sample()
As a qubit is point in a 3 dimensional space, for visualization purposes, it is more convenient to consider a rebit; a qubit constrained to the $X$-$Z$ plane. The Tomography
module implements the visualization of distributions through the methods plot_rebit_prior
and plot_rebit_posterior
.
qi.tomography.plot_rebit_prior(prior, rebit_axes=[1, 3], true_state=true_state, true_size=500)
C:\Users\cgranade\Anaconda2\lib\site-packages\matplotlib\__init__.py:898: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter. warnings.warn(self.msg_depr % (key, alt_key))
Brief side note: Note that this projects the qubit down to a specified pair of axes. To get a better idea of what the GinibreDistribution
"looks like" we can use the GinibreReditDistribution
which does not project down from another axis.
qi.tomography.plot_rebit_prior(
qi.tomography.distributions.GinibreReditDistribution(basis),
rebit_axes=[1, 3]
)
In the simplest case, we can consider two-outcome measurements represented by the pair $\{E,I-E\}$. The Born rule defines the likelihood function $$ \Pr(E|\rho)=\operatorname{Tr}(\rho E). $$ For multiple measurements, we simply iterate. For many trials of the same measurement, we can use a derived model as discussed below. The actual measurement performed ($E$) is specified as set of experiment parameters, which we will let QInfer automatically choose below.
model = qi.tomography.TomographyModel(basis)
The core method in QInfer is the SMCUpdater
which performs the Bayesian updates. It is instantiated with a prior and a model. It internally holds a representation of the current distribution as a set of weighted particles, the number of which controls the approximation and must be specified when instantiating.
Note that the complexity of computing the posterior is roughly linear in the number of particles.
n_particles = 10000
updater = qi.SMCUpdater(model, n_particles, prior)
For simulations, common randomized measurement choices are already
implemented. For example, RandomPauliHeuristic
chooses random
Pauli measurements for qubit tomography.
hueristic = qi.tomography.RandomPauliHeuristic(updater)
Now we simulate experiments and update our prior knowledge using QInfer.
n_measurements = 100
for idx_meas in range(n_measurements):
expparams = hueristic()
datum = model.simulate_experiment(true_state, expparams)
updater.update(datum, expparams)
c:\users\cgranade\dropbox\software-projects\python-qinfer\src\qinfer\utils.py:268: ApproximationWarning: Numerical error in covariance estimation causing positive semidefinite violation. warnings.warn('Numerical error in covariance estimation causing positive semidefinite violation.', ApproximationWarning)
Finally, we plot the posterior, its mean and a 95% credible ellipse.
qi.tomography.plot_rebit_posterior(updater,
true_state=true_state, rebit_axes=[1, 3],
legend=False, region_est_method='cov'
)
plt.legend(ncol=1, numpoints=1, scatterpoints=1, bbox_to_anchor=(1.9, 0.5), loc='right')
<matplotlib.legend.Legend at 0x10fc7978>