Author(s): Paul Miles | Date Created: May 5, 2020
Many models are time consuming to evaluate. As MCMC simulations required many model evaluations, it can be useful to periodically save the chain elements to a file. This can be useful for a variety of reasons:
This is important when working on remote systems where you may have limited computation time. This tutorial demonstrates the following:
Similar or related topics are also discussed in the tutorial Chain Log Files.
Import required paths.
import numpy as np
from pymcmcstat.MCMC import MCMC
from datetime import datetime
import pymcmcstat
print(pymcmcstat.__version__)
1.9.0
Define a simple model and sum-of-squares function.
# define test model function
def test_modelfun(xdata, theta):
m = theta[0]
b = theta[1]
nrow, ncol = xdata.shape
y = np.zeros([nrow,1])
y[:,0] = m*xdata.reshape(nrow,) + b
return y
def test_ssfun(theta, data):
xdata = data.xdata[0]
ydata = data.ydata[0]
# eval model
ymodel = test_modelfun(xdata, theta)
# calc sos
ss = sum((ymodel[:, 0] - ydata[:, 0])**2)
return ss
Initialize MCMC object:
# Initialize MCMC object
mcset = MCMC()
# Add data
nds = 100
x = np.linspace(2, 3, num=nds)
y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)
mcset.data.add_data_set(x, y)
# update model settings
mcset.model_settings.define_model_settings(sos_function=test_ssfun)
mcset.parameters.add_model_parameter(
name='m',
theta0=2.,
minimum=-10,
maximum=np.inf,
sample=True)
mcset.parameters.add_model_parameter(
name='b',
theta0=-5.,
minimum=-10,
maximum=100,
sample=True)
The following keyword arguments of the simulation options allow you to setup the log files.
savedir
: Directory in which to store log files. If not specified, but log files turned on, then saves to directory with naming convention 'YYYYMMDD_hhmmss_chain_log'.save_to_bin
: Save log files in binary format. Uses h5py
package for binary read/write.save_to_txt
: Save log files in text format. Uses numpy
package for text read/write.By default the feature is set to False
. You can save to either format or to both. Regardless of what format is used to save the chain, a text log file will be included which appends a date/time stamp with corresponding chain indices. This will be explained in more detail later.
We choose to save to the resource directory, and to save to .txt only (save_to_txt=True
). To accommodate restart, it is important to also indicate save_to_json = True
. By default, the .json file will contain all information, including the chain files. However, since we are saving the chains to a log file, we include the argument save_lightly = True
. This results in the required meta data being saved to the .json file instead of the potentially large chains.
import os
datestr = datetime.now().strftime('%Y%m%d_%H%M%S')
savedir = 'resources' + os.sep + str('{}_{}'.format(datestr, 'demo_restart'))
mcset.simulation_options.define_simulation_options(
nsimu=int(5e3), updatesigma=1, method='dram',
savesize=1000, save_to_json=True,
verbosity=0, waitbar=True, save_to_txt=True,
save_lightly=True, savedir=savedir)
mcset.run_simulation()
[-----------------100%-----------------] 5000 of 5000 complete in 5.4 sec
To verify the restart procedure, we display the final chain values for our two parameters.
results = mcset.simulation_results.results
chain = results['chain']
print('Final chain values: {}'.format(chain[-1, :]))
Final chain values: [1.95678129 3.13920213]
We observe that the folder 20200505_213622_demo_restart
matches the default pattern for the output directory, and we display its contents
ls resources/20200505_213622_demo_restart
20200505_213622_mcmc_simulation.json covchainfile.txt sschainfile.txt chainfile.txt s2chainfile.txt txtlogfile.txt
As expected, there are log files saved in text (.txt) format. There is also a .json file that contains all the necessary meta data to restart the simulation. Note, if you run this simulation on your machine, the results folder will be different because of the date/time stamp.
del mcset
mcset = MCMC()
# Add data
nds = 100
x = np.linspace(2, 3, num=nds)
y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)
mcset.data.add_data_set(x, y)
# update model settings
mcset.model_settings.define_model_settings(sos_function=test_ssfun)
mcset.parameters.add_model_parameter(
name='m',
theta0=2.,
minimum=-10,
maximum=np.inf,
sample=True)
mcset.parameters.add_model_parameter(
name='b',
theta0=-5.,
minimum=-10,
maximum=100,
sample=True)
To access the previously run simulations, we must defined the location they were saved and also the name of the .json restart file.
import os
savedir = 'resources' + os.sep + '20200505_213622_demo_restart'
restart_file = savedir + os.sep + '20200505_213622_mcmc_simulation.json'
mcset.simulation_options.define_simulation_options(
nsimu=int(5e3), updatesigma=1, method='dram',
savesize=1000, save_to_json=True,
verbosity=0, waitbar=True, save_to_txt=True,
save_lightly=True,
json_restart_file=restart_file)
mcset.run_simulation()
[-----------------100%-----------------] 5000 of 5000 complete in 4.4 sec
So, the simulation ran, but did it successfully use the old information?... We can display the first chain elements to see if they match the final from the first simulation.
results = mcset.simulation_results.results
chain = results['chain']
print('{}'.format(chain[0, :]))
[1.95678129 3.13920213]
While this isn't the most streamlined approach to restarting simulations, it is a possible solution.
Caveats:
It should be noted that this approach does not take the final element of the s2chain
when initializing the error variance for the restart simulation - this is an artifact of the source code the Python package is based on. You could add this feature by reading the s2chainfile
and extracting the final row. You can then include it in the simulation by specifying sigma2
when defining the model_settings
.
Another current limitation is that all the simulations will be exported to separate folders/files. The pymcmcstat
package does not currently have a simple procedure for splicing these result sets together, but it is possible do merge the chain results using other Python methods.