Monod Example¶

Author(s): Paul Miles | Date Created: June 18, 2018

This example is from P.M. Berthouex and L.C. Brown: "Statistics for Environmental Engineers", CRC Press, 2002.

We fit the Monod model $$y(t;q) = q_1 \frac{t}{q_2 + t} + \epsilon, \quad \epsilon\sim N(0,\sigma^2), \quad q = [\mu_{max}, K_x]$$ to observations:

x (mg/L COD)	y (1/h)
28	0.053
55	0.060
83	0.112
110	0.105
138	0.099
225	0.122
375	0.125

Import required packages¶

In [1]:

import numpy as np
import scipy.optimize
from pymcmcstat.MCMC import MCMC
import matplotlib.pyplot as plt
import pymcmcstat
print(pymcmcstat.__version__)

1.9.0

Initialize MCMC object, and create data structure¶

In [2]:

# Initialize MCMC object
mcstat = MCMC()
# Next, create a data structure for the observations and control
# variables. Typically one could make a structure |data| that
# contains fields |xdata| and |ydata|.
ndp = 7
x = np.array([28, 55, 83, 110, 138, 225, 375])  # (mg / L COD)
x = x.reshape(ndp, 1) # enforce column vector
y = np.array([0.053, 0.060, 0.112, 0.105, 0.099, 0.122, 0.125]) # (1 / h)
y = y.reshape(ndp, 1) # enforce column vector
# data structure 
mcstat.data.add_data_set(x,y)

Define model and sum-of-squares function¶

For the MCMC run we need the sum of squares function. For the plots we shall also need a function that returns the model. Both the model and the sum of squares functions are easy to write as follows

In [3]:

def modelfun(x, theta):
    return theta[0]*x/(theta[1] + x)

def ssfun(theta,data):
    res = data.ydata[0] - modelfun(data.xdata[0], theta)
    return (res ** 2).sum(axis=0)

Perform initial optimization study for estimating the error variance and computing the covariance matrix¶

We define a residuals function and minimize it using scipy's optimize least-squares.

In [4]:

def residuals(p, x, y):
    return y - modelfun(x, p)

theta0, ssmin = scipy.optimize.leastsq(
    residuals,
    x0=[0.15, 100],
    args=(mcstat.data.xdata[0].reshape(ndp,),
          mcstat.data.ydata[0].reshape(ndp,)))
n = mcstat.data.n[0] # number of data points in model
p = len(theta0); # number of model parameters (dof)
ssmin = ssfun(theta0, mcstat.data) # calculate the sum-of-squares error
mse = ssmin/(n-p) # estimate for the error variance

The Jacobian matrix of the model function is easy to calculate so we use it to produce estimate of the covariance of theta. This can be used as the initial proposal covariance for the MCMC simulation by assigning it to qcov in the simulation_options (see below).

In [5]:

J = np.array([[x/(theta0[1]+x)], [-theta0[0]*x/(theta0[1]+x)**2]])
J = J.transpose()
J = J.reshape(n,p)
tcov = np.linalg.inv(np.dot(J.transpose(), J))*mse
print('tcov = {}'.format(tcov))

tcov = [[2.44707212e-04 2.50115314e-01]
 [2.50115314e-01 3.20837579e+02]]

Setup parameters, model settings, and simulation options¶

Parameters:
- $\mu_{max} \in [0,\infty]$
- $K_x \in [0,\infty]$
Model Settings:
- $\sigma^2_0 = 0.01^2$ (initial error variance)
- sum-of-squares function defined above
Simulation Options:
- Number of simulation: 5000
- Update error variance
- Initial covariance matrix from least squares optimization

In [6]:

mcstat.parameters.add_model_parameter(
    name='$\mu_{max}$',
    theta0=theta0[0],
    minimum=0)
mcstat.parameters.add_model_parameter(
    name='$K_x$',
    theta0=theta0[1],
    minimum=0)

In [7]:

mcstat.simulation_options.define_simulation_options(
    nsimu=5.0e3,
    updatesigma=1,
    qcov=tcov)
mcstat.model_settings.define_model_settings(
    sos_function=ssfun,
    sigma2=0.01**2)

Run Simulation¶

In [8]:

mcstat.run_simulation()

Sampling these parameters:
      name      start [      min,       max] N(       mu,   sigma^2)
$\mu_{max}$:      0.15 [ 0.00e+00,       inf] N( 0.00e+00,      inf)
     $K_x$:     49.05 [ 0.00e+00,       inf] N( 0.00e+00,      inf)
 [-----------------100%-----------------] 5000 of 5000 complete in 1.2 sec

Extract results:¶

Display chain statistics

In [9]:

results = mcstat.simulation_results.results
names = results['names']
chain = results['chain']
s2chain = results['s2chain']
names = results['names'] # parameter names
mcstat.chainstats(chain, results)


------------------------------
name      :       mean        std     MC_err        tau     geweke
$\mu_{max}$:     0.1565     0.0360     0.0031    34.2737     0.9768
$K_x$     :    65.6914    47.4967     3.9469    32.1899     0.8879
------------------------------
==============================
Acceptance rate information
---------------
Results dictionary:
Stage 1: 21.40%
Stage 2: 51.36%
Net    : 72.76% -> 3638/5000
---------------
Chain provided:
Net    : 72.76% -> 3638/5000
---------------
Note, the net acceptance rate from the results dictionary
may be different if you only provided a subset of the chain,
e.g., removed the first part for burnin-in.
------------------------------

Plot mean model response and compare with data¶

In [10]:

# A point estimate of the model parameters can be calculated from the
# mean of the |chain|. Here we plot the fitted model using the
# posterior means of the parameters.
xmod = np.linspace(1e-4,400)
plt.figure(1)
hmodel, = plt.plot(xmod,modelfun(xmod,np.mean(chain, 0)),
                   '-k', label='model')
hdata, = plt.plot(mcstat.data.xdata[0], mcstat.data.ydata[0],
                  's', label='data');
plt.xlim([0, 400])
plt.xlabel('x (mg/L COD)')
plt.ylabel('y (1/h)')
plt.legend(handles=[hdata, hmodel]);

Visualize parameter chains and pairwise correlation:¶

plot_chain_panel: plots the sampling history of the simulation
plot_pairwise_correlation_panel: chains plotted against each other

In [11]:

from pymcmcstat import mcmcplot as mcp
# plot chain panel
mcp.plot_chain_panel(chain, names)
# The |'pairs'| options makes pairwise scatterplots of the columns of
# the |chain|.
pwfig = mcp.plot_pairwise_correlation_panel(
    chain, names, settings=dict(fig=dict(figsize=(4,4))))

Generate credible intervals¶

Instead of just a point estimate of the fit, we should also study the predictive posterior distribution of the model. We can calculate the model fit for a randomly selected subset of the chain and calculate the predictive envelope of the model. The grey areas in the plot correspond to 50%, 90%, 95%, and 99% posterior regions.

In [12]:

def predmodelfun(data, theta):
    return modelfun(data.xdata[0], theta)

mcstat.PI.setup_prediction_interval_calculation(
    results=results,
    data=mcstat.data,
    modelfunction=predmodelfun)
mcstat.PI.generate_prediction_intervals(
    nsample=500,
    calc_pred_int=False)
# plot prediction intervals
mcstat.PI.plot_prediction_intervals(adddata=True)
plt.xlabel('x (mg/L COD)', Fontsize=20)
plt.xticks(Fontsize=20)
plt.ylabel('y (1/h)', Fontsize=20)
plt.yticks(Fontsize=20)
plt.title('Predictive envelopes of the model', Fontsize=20)

Generating credible/prediction intervals:


Interval generation complete

Out[12]:

Text(0.5, 1, 'Predictive envelopes of the model')

Alternative Approach¶

As of version 1.9.0, the recommend method for generating these type of intervals plots is to use the propagation module.

In [14]:

from pymcmcstat.propagation import calculate_intervals, plot_intervals

def newmodelfun(q, data):
    x = data.xdata[0]
    a, b = q
    return a*x/(b + x)

data = mcstat.data
intervals = calculate_intervals(chain, results, data, newmodelfun,
                               s2chain=s2chain, nsample=500, waitbar=True)
f, ax = plot_intervals(intervals, data.xdata[0], data.ydata[0],
               addprediction=False, adddata=True, limits=[50, 90, 95, 99])
f.tight_layout()

 [-----------------100%-----------------] 500 of 500 complete in 0.0 sec

See tutorial on Advanced Interval Plotting for details on how to customize the appearance.