Setting the Random Seed

Author(s): Paul Miles | Date Created: July 20, 2019

For the purpose of testing the MCMC simulation on a particular problem, it may be useful to check whether the results are repeatable. This can be accomplished by setting the seed for the random number generator used within pymcmcstat. This tutorial outlines how to accomplish this, and demonstrates the repeatability of the results.

First we import the required packages, define our model/sum-of-squares functions, and define a data set.

In [1]:
import numpy as np
from pymcmcstat.MCMC import MCMC
import pymcmcstat
print(pymcmcstat.__version__)
np.seterr(over = 'ignore')
1.9.0
Out[1]:
{'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'}
In [2]:
# define test model function
def test_modelfun(xdata, theta):
    m = theta[0]
    b = theta[1]
    nrow, ncol = xdata.shape
    y = np.zeros([nrow, 1])
    y[:,0] = m*xdata.reshape(nrow,) + b
    return y

def test_ssfun(theta, data):
    xdata = data.xdata[0]
    ydata = data.ydata[0]
    # eval model
    ymodel = test_modelfun(xdata, theta)
    # calc sos
    ss = sum((ymodel[:, 0] - ydata[:, 0])**2)
    return ss

# define data
nds = 100
x = np.linspace(2, 3, num=nds)
y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)

Initialize MCMC object and set seed

By default, no seed is specifield. To specify a seed simply define a numeric value for the keywork argument rngseed.

In [3]:
# Initialize MCMC object
mcstat = MCMC(rngseed=1234)

Setup and run rest of simulation

We set the

  • data
  • simulation options
  • model settings
  • model parameters

just like we would for any other simulation.

In [4]:
mcstat.data.add_data_set(x, y)
mcstat.simulation_options.define_simulation_options(
    nsimu=int(5.0e3), updatesigma=True, method='dram')
# update model settings
mcstat.model_settings.define_model_settings(sos_function=test_ssfun)
mcstat.parameters.add_model_parameter(
    name='m',
    theta0=2.,
    minimum=-10,
    maximum=np.inf,
    sample=True)
mcstat.parameters.add_model_parameter(
    name='b',
    theta0=-5.,
    minimum=-10,
    maximum=100,
    sample=True)
# run mcmc
mcstat.run_simulation()
Sampling these parameters:
      name      start [      min,       max] N(       mu,   sigma^2)
         m:      2.00 [   -10.00,       inf] N( 0.00e+00,      inf)
         b:     -5.00 [   -10.00,    100.00] N( 0.00e+00,      inf)
 [-----------------100%-----------------] 5000 of 5000 complete in 1.5 sec

Extract Results and Display Chainstats

In addition, we check the last row of the chain to see where the simulation ended.

In [5]:
# Extract results
results = mcstat.simulation_results.results
chain = results['chain']
s2chain = results['s2chain']
sschain = results['sschain']
names = results['names']
# define burnin
burnin = 2000
# display chain statistics
mcstat.chainstats(chain[burnin:, :], results)
print('chain[-1, :] = {}'.format(chain[-1, :]))
------------------------------
name      :       mean        std     MC_err        tau     geweke
m         :     1.9850     0.0286     0.0028    38.2546     0.9896
b         :     3.0562     0.0704     0.0069    40.3475     0.9821
------------------------------
==============================
Acceptance rate information
---------------
Results dictionary:
Stage 1: 0.94%
Stage 2: 3.00%
Net    : 3.94% -> 197/5000
---------------
Chain provided:
Net    : 2.60% -> 78/3000
---------------
Note, the net acceptance rate from the results dictionary
may be different if you only provided a subset of the chain,
e.g., removed the first part for burnin-in.
------------------------------
chain[-1, :] = [1.98462128 3.06389946]

Check Repeatability

To check the repeatability we simply create a new MCMC object with the same random seed and compare the result.

In [6]:
# Initialize MCMC object
mcstat = MCMC(rngseed=1234)
mcstat.data.add_data_set(x, y)
mcstat.simulation_options.define_simulation_options(
    nsimu=int(5.0e3), updatesigma=True, method='dram')
# update model settings
mcstat.model_settings.define_model_settings(sos_function=test_ssfun)
mcstat.parameters.add_model_parameter(
    name='m',
    theta0=2.,
    minimum=-10,
    maximum=np.inf,
    sample=True)
mcstat.parameters.add_model_parameter(
    name='b',
    theta0=-5.,
    minimum=-10,
    maximum=100,
    sample=True)
# run mcmc
mcstat.run_simulation()
# Extract results
results = mcstat.simulation_results.results
chain = results['chain']
s2chain = results['s2chain']
sschain = results['sschain']
names = results['names']
# define burnin
burnin = 2000
# display chain statistics
mcstat.chainstats(chain[burnin:, :], results)
print('chain[-1, :] = {}'.format(chain[-1, :]))
Sampling these parameters:
      name      start [      min,       max] N(       mu,   sigma^2)
         m:      2.00 [   -10.00,       inf] N( 0.00e+00,      inf)
         b:     -5.00 [   -10.00,    100.00] N( 0.00e+00,      inf)
 [-----------------100%-----------------] 5000 of 5000 complete in 1.5 sec

------------------------------
name      :       mean        std     MC_err        tau     geweke
m         :     1.9850     0.0286     0.0028    38.2546     0.9896
b         :     3.0562     0.0704     0.0069    40.3475     0.9821
------------------------------
==============================
Acceptance rate information
---------------
Results dictionary:
Stage 1: 0.94%
Stage 2: 3.00%
Net    : 3.94% -> 197/5000
---------------
Chain provided:
Net    : 2.60% -> 78/3000
---------------
Note, the net acceptance rate from the results dictionary
may be different if you only provided a subset of the chain,
e.g., removed the first part for burnin-in.
------------------------------
chain[-1, :] = [1.98462128 3.06389946]

It is clearly seen that the results are identical to the first simulation, so the random number process is repeatable.