Author(s): Paul Miles | Date Created: June 18, 2018
This example is from P.M. Berthouex and L.C. Brown: "Statistics for Environmental Engineers", CRC Press, 2002.
We fit the Monod model $$y(t;q) = q_1 \frac{t}{q_2 + t} + \epsilon, \quad \epsilon\sim N(0,\sigma^2), \quad q = [\mu_{max}, K_x]$$ to observations:
x (mg/L COD) | y (1/h) |
---|---|
28 | 0.053 |
55 | 0.060 |
83 | 0.112 |
110 | 0.105 |
138 | 0.099 |
225 | 0.122 |
375 | 0.125 |
import numpy as np
import scipy.optimize
from pymcmcstat.MCMC import MCMC
import matplotlib.pyplot as plt
import pymcmcstat
print(pymcmcstat.__version__)
1.9.0
# Initialize MCMC object
mcstat = MCMC()
# Next, create a data structure for the observations and control
# variables. Typically one could make a structure |data| that
# contains fields |xdata| and |ydata|.
ndp = 7
x = np.array([28, 55, 83, 110, 138, 225, 375]) # (mg / L COD)
x = x.reshape(ndp, 1) # enforce column vector
y = np.array([0.053, 0.060, 0.112, 0.105, 0.099, 0.122, 0.125]) # (1 / h)
y = y.reshape(ndp, 1) # enforce column vector
# data structure
mcstat.data.add_data_set(x,y)
For the MCMC run we need the sum of squares function. For the plots we shall also need a function that returns the model. Both the model and the sum of squares functions are easy to write as follows
def modelfun(x, theta):
return theta[0]*x/(theta[1] + x)
def ssfun(theta,data):
res = data.ydata[0] - modelfun(data.xdata[0], theta)
return (res ** 2).sum(axis=0)
We define a residuals function and minimize it using scipy's optimize least-squares.
def residuals(p, x, y):
return y - modelfun(x, p)
theta0, ssmin = scipy.optimize.leastsq(
residuals,
x0=[0.15, 100],
args=(mcstat.data.xdata[0].reshape(ndp,),
mcstat.data.ydata[0].reshape(ndp,)))
n = mcstat.data.n[0] # number of data points in model
p = len(theta0); # number of model parameters (dof)
ssmin = ssfun(theta0, mcstat.data) # calculate the sum-of-squares error
mse = ssmin/(n-p) # estimate for the error variance
The Jacobian matrix of the model function is easy to calculate so we use it to produce estimate of the covariance of theta. This can be used as the initial proposal covariance for the MCMC simulation by assigning it to qcov
in the simulation_options
(see below).
J = np.array([[x/(theta0[1]+x)], [-theta0[0]*x/(theta0[1]+x)**2]])
J = J.transpose()
J = J.reshape(n,p)
tcov = np.linalg.inv(np.dot(J.transpose(), J))*mse
print('tcov = {}'.format(tcov))
tcov = [[2.44707212e-04 2.50115314e-01] [2.50115314e-01 3.20837579e+02]]
mcstat.parameters.add_model_parameter(
name='$\mu_{max}$',
theta0=theta0[0],
minimum=0)
mcstat.parameters.add_model_parameter(
name='$K_x$',
theta0=theta0[1],
minimum=0)
mcstat.simulation_options.define_simulation_options(
nsimu=5.0e3,
updatesigma=1,
qcov=tcov)
mcstat.model_settings.define_model_settings(
sos_function=ssfun,
sigma2=0.01**2)
mcstat.run_simulation()
Sampling these parameters: name start [ min, max] N( mu, sigma^2) $\mu_{max}$: 0.15 [ 0.00e+00, inf] N( 0.00e+00, inf) $K_x$: 49.05 [ 0.00e+00, inf] N( 0.00e+00, inf) [-----------------100%-----------------] 5000 of 5000 complete in 1.2 sec
results = mcstat.simulation_results.results
names = results['names']
chain = results['chain']
s2chain = results['s2chain']
names = results['names'] # parameter names
mcstat.chainstats(chain, results)
------------------------------ name : mean std MC_err tau geweke $\mu_{max}$: 0.1565 0.0360 0.0031 34.2737 0.9768 $K_x$ : 65.6914 47.4967 3.9469 32.1899 0.8879 ------------------------------ ============================== Acceptance rate information --------------- Results dictionary: Stage 1: 21.40% Stage 2: 51.36% Net : 72.76% -> 3638/5000 --------------- Chain provided: Net : 72.76% -> 3638/5000 --------------- Note, the net acceptance rate from the results dictionary may be different if you only provided a subset of the chain, e.g., removed the first part for burnin-in. ------------------------------
# A point estimate of the model parameters can be calculated from the
# mean of the |chain|. Here we plot the fitted model using the
# posterior means of the parameters.
xmod = np.linspace(1e-4,400)
plt.figure(1)
hmodel, = plt.plot(xmod,modelfun(xmod,np.mean(chain, 0)),
'-k', label='model')
hdata, = plt.plot(mcstat.data.xdata[0], mcstat.data.ydata[0],
's', label='data');
plt.xlim([0, 400])
plt.xlabel('x (mg/L COD)')
plt.ylabel('y (1/h)')
plt.legend(handles=[hdata, hmodel]);
plot_chain_panel
: plots the sampling history of the simulationplot_pairwise_correlation_panel
: chains plotted against each otherfrom pymcmcstat import mcmcplot as mcp
# plot chain panel
mcp.plot_chain_panel(chain, names)
# The |'pairs'| options makes pairwise scatterplots of the columns of
# the |chain|.
pwfig = mcp.plot_pairwise_correlation_panel(
chain, names, settings=dict(fig=dict(figsize=(4,4))))
Instead of just a point estimate of the fit, we should also study the predictive posterior distribution of the model. We can calculate the model fit for a randomly selected subset of the chain and calculate the predictive envelope of the model. The grey areas in the plot correspond to 50%, 90%, 95%, and 99% posterior regions.
def predmodelfun(data, theta):
return modelfun(data.xdata[0], theta)
mcstat.PI.setup_prediction_interval_calculation(
results=results,
data=mcstat.data,
modelfunction=predmodelfun)
mcstat.PI.generate_prediction_intervals(
nsample=500,
calc_pred_int=False)
# plot prediction intervals
mcstat.PI.plot_prediction_intervals(adddata=True)
plt.xlabel('x (mg/L COD)', Fontsize=20)
plt.xticks(Fontsize=20)
plt.ylabel('y (1/h)', Fontsize=20)
plt.yticks(Fontsize=20)
plt.title('Predictive envelopes of the model', Fontsize=20)
Generating credible/prediction intervals: Interval generation complete
Text(0.5, 1, 'Predictive envelopes of the model')
As of version 1.9.0, the recommend method for generating these type of intervals plots is to use the propagation
module.
from pymcmcstat.propagation import calculate_intervals, plot_intervals
def newmodelfun(q, data):
x = data.xdata[0]
a, b = q
return a*x/(b + x)
data = mcstat.data
intervals = calculate_intervals(chain, results, data, newmodelfun,
s2chain=s2chain, nsample=500, waitbar=True)
f, ax = plot_intervals(intervals, data.xdata[0], data.ydata[0],
addprediction=False, adddata=True, limits=[50, 90, 95, 99])
f.tight_layout()
[-----------------100%-----------------] 500 of 500 complete in 0.0 sec
See tutorial on Advanced Interval Plotting for details on how to customize the appearance.