Notebook

Demystifying Markov Chain Monte Carlo¶

Brett Morris¶

In this tutorial¶

We will write our own Markov Chain Monte Carlo algorithm with a Metropolis-Hastings sampler.

In [ ]:

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)

# Set the properties of the line that we'll be fitting: 
true_slope = 0.3
true_intercept = 0.0

x = np.linspace(0, 10, 50)
y = np.random.randn(len(x)) + true_slope * x + true_intercept
yerr = np.random.rand(len(x))/10 + 1

plt.errorbar(x, y, yerr, fmt='.')
plt.plot(x, true_slope * x + true_intercept, color='k')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

We define a simple linear model to describe the data, which has parameters $\theta = \{m, b\}$.

In [ ]:

def linear_model(theta, x): 
    """
    Simple linear model. 
    """
    m, b = theta
    return m * x + b


def chi2(theta, x, y, yerr, model): 
    """
    Compute the \chi^2 by comparing `model` evaluated at `theta`
    to `y` at each value of `x`. 
    """
    return np.sum((model(theta, x) - y)**2 / yerr**2)

Let the proposal step simply draw from a Gaussian distribution:

In [ ]:

def proposal(theta, scale=1):
    """
    Generate proposal step, by adding a draw from a 
    Gaussian distribution to the initial step position
    """
    return theta + scale * np.random.randn(len(theta))

We will decide whether or not to accept new proposal steps using the Metropolis-Hastings algorithm, as defined by Ford (2005):

In [ ]:

def metropolis_hastings(init_theta, x, y, yerr, acceptance, scale=0.05):
    """
    Metropolis-Hastings algorithm, a la Ford (2005). 
    
    1) Generate a proposal step
    2) Compare the chi^2 of the proposal step to the current step
    3) Draw a random number `u` on [0, 1]
    4) Compute alpha = min([exp(-0.5 * (chi2(new) - chi2(old)), 1])
    5) If u <= alpha, accept step, otherwise keep step
    """
    # Generate a proposal step: 
    proposed_theta = proposal(init_theta, scale=scale)

    # Compare chi^2 of proposed step to current step:
    chi2_init_step = chi2(init_theta, x, y, yerr, linear_model)
    chi2_proposed_step = chi2(proposed_theta, x, y, yerr, linear_model)
    relative_likelihood = np.exp(-0.5 * (chi2_proposed_step - chi2_init_step))
    
    alpha = np.min([relative_likelihood, 1])

    # If U(0, 1) <= alpha, accept the step: 
    if np.random.rand() <= alpha: 
        return proposed_theta, acceptance + 1
    else: 
        return init_theta, acceptance
    

def sampler(x, y, yerr, init_theta, n_steps, scale=0.05):
    """
    Markov Chain Monte Carlo sampler. 
    """
    current_theta = np.copy(init_theta)
    # Allocate memory for samples: 
    samples = np.zeros((n_steps, len(init_theta)))
    samples[0, :] = init_theta
    acceptance = 0
    
    for i in range(1, n_steps):
        # Run the M-H algorithm to determine next step:
        current_theta, acceptance = metropolis_hastings(current_theta, 
                                                        x, y, yerr, acceptance, 
                                                        scale=scale)
        # Record the result: 
        samples[i, :] = current_theta
        
    # Compute the final acceptance rate
    acceptance_rate = acceptance / n_steps
    
    return samples, acceptance_rate

Make an initial guess (which is wrong!) and let the MCMC algorithm find the correct solution:

In [ ]:

init_parameters = [1, 0.5]  # slope, intercept
n_steps = 50000

# This tweakable parameter determines how
# far new steps should be taken away from 
# previous steps. Increase `scale` to 
# decrease your acceptance rate: 
scale = 0.06

samples, acceptance_rate = sampler(x, y, yerr, init_parameters, 
                                   n_steps, scale=scale)

What is the acceptance rate? Ideally this should be near 45%:

In [ ]:

print(acceptance_rate)

Let's plot a few random draws from the posterior probability distribution functions for each parameter:

In [ ]:

for i in range(100): 
    random_step = np.random.randint(0, samples.shape[0])
    random_theta = samples[random_step, :]
    plt.plot(x, linear_model(random_theta, x), alpha=0.05, color='k')
plt.errorbar(x, y, yerr, fmt='.')
plt.xlabel('x')
plt.ylabel('y')

You can see that the uncertainty in the measurements is being reflected by uncertainty in the slope and intercept parameters.

We normally see the posterior probability distribution functions displayed in a "corner" plot like the one below:

In [ ]:

from corner import corner

burned_in_samples = samples[1000:]

corner(burned_in_samples, labels=['m', 'b'], truths=[true_slope, true_intercept]);

Note that the posterior distributions are consistent within the uncertainties for each parameter! We are accurately measuring each parameter and their uncertainties.

Parameter Degeneracies¶

But these parameters $m$ and $b$ are correlated with one another – note that small values of $m$ correspond to large values of $b$ while small values of $b$ correspond to large values of $m$. We can get rid of this degeneracy by reparameterizing our model to fit for the parameters $\theta$ and $b$, see Hogg et al. (2010).

In [ ]:

def ensemble_sampler(x, y, yerr, init_thetas, n_steps, n_walkers, n_dim, scale=0.05):
    """
    Markov Chain Monte Carlo sampler with multiple walkers.
    """
    current_theta = np.copy(init_thetas)
    # Allocate memory for samples: 
    samples = np.zeros((n_steps, n_walkers, n_dim))
    samples[0, :, :] = current_theta
    acceptance = 0
    
    for i in range(1, n_steps):
        for j in range(n_walkers):
            # Run the M-H algorithm to determine next step:
            current_theta, acceptance = metropolis_hastings(samples[i-1, j, :], 
                                                            x, y, yerr, acceptance, 
                                                            scale=scale)
            # Record the result: 
            samples[i, j, :] = current_theta
        
    # Compute the final acceptance rate
    acceptance_rate = acceptance / (n_steps * n_walkers)
    
    return samples, acceptance_rate

In [ ]:

# `n_dim` is the number of free parameters
n_dim = 2

# `n_walkers` is the number of (independent) samplers
n_walkers = 5

# `n_steps` is the number of steps per walker 
n_steps = 10000

# Create a group of initial parameters, one per walker, clustered
# around the initial guess for the maximum-likelihood parameters: 
init_params_ensemble = [np.array(init_parameters) + 1e-5 * np.random.randn()
                        for i in range(n_walkers)]

samples, acceptance_rate = ensemble_sampler(x, y, yerr, init_params_ensemble, 
                                            n_steps, n_walkers, n_dim)

Plot the "trace" of each walker, i.e. how each walker evolves from step to step:

In [ ]:

fig, ax = plt.subplots(2, 1, figsize=(4, 5), sharex=True)
ax[0].plot(samples[:, :, 0])
ax[0].set(ylabel='m')
ax[1].plot(samples[:, :, 1])
ax[1].set(xlabel='Step', ylabel='b')
fig.tight_layout()
plt.show()

From the "trace" plot above, you can see that the chains appear to be oscillate around their final values after the first 1000 steps or so.

Plot the corner plot for all chains:

In [ ]:

# Skip the first 1000 steps to allow chains to reach convergence 
burned_in_samples = samples[1000:]
shape = burned_in_samples.shape

# We need to reshape the 3D array into a 2D array to make the corner plot: 
corner(burned_in_samples.reshape((shape[0]*n_walkers, shape[2])), 
       labels=['m', 'b'], truths=[true_slope, true_intercept])
plt.show()

Have the chains "converged"? One way to check is by computing the autocorrelation length of the chains, which we do below:

In [ ]:

def autocorrelation(x):
    """
    Calculate the autocorrelation function of array `x`.
    """
    result = np.correlate(x, x, mode='full')
    return result[result.size//2:]

def first_zero_crossing(acf): 
    """
    Find index of first zero-crossing of the autcorrelation function 
    """
    return np.argwhere(np.diff(np.sign(acf)))[0][0]

In [ ]:

effective_chain_lengths = []
for walker in burned_in_samples[:, :, 0].T:
    acf = autocorrelation(walker - walker.mean())
    ind = first_zero_crossing(acf)
    effective_chain_lengths.append(burned_in_samples.shape[0]/ind)
    
print(effective_chain_lengths)

These effective chain lengths should be large to ensure convergence. If they're small (in the range of a few to ten), you haven't sampled enough steps to reach convergence.

In [ ]: