import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
%matplotlib notebook
import paramonte as pm
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
sns.set()
For a more comprehensive discussion of different aspects and attributes of the sampler and how to run the simulation in parallel, see the multivariate Normal distribution Jupyter Notebook example in the same folder as this Jupyter file exists.
Suppose we want to sample random points from a standard univariate Gaussian function. The following Python function getLogFunc()
returns the natural logarithm of the Probability Density Function of the univariate standard Gaussian distribution.
import numpy as np
logSqrt2Pi = np.log(np.sqrt(2*np.pi))
def getLogFunc(x): return -0.5*x**2 - logSqrt2Pi
Since the mathematical objective functions (e.g., probability density functions) can take extremely small or large values, we often work with their natural logarithms instead. This is the reason behind the naming convention used in the ParaMonte library for the user's objective functions: getLogFunc, implying that the user must provide a function that returns the natural logarithm of the target objective function.
See this Jupyter Notebook for an in-depth discussion of why we need to work with the logarithm of mathematical objective functions in optimization and sampling problems.
We will sample random points from this objective function by calling the ParaDRAM sampler (Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler) of the ParaMonte library.
The simplest scenario would be to run the simulation with the default specifications that are appropriately determined by the ParaDRAM sampler.
To run the sampler in parallel, you will have to first save the MPI-enabled script as an external file. Visit the ParaMonte library's documentation website for more information, or see this parallel ParaDRAM simulation Jupyter notebook example.
However, for further clarity of this particular example, we will specify an output folder for the automatically-named output files of the simulation.
import paramonte as pm
print( pm.version.interface.get() )
print( pm.version.kernel.get() )
pm.checkForUpdate() # check for new versions of the ParaMonte library
pmpd = pm.ParaDRAM() # define a ParaMonte sampler instance
pmpd.spec.chainSize = 10000 # request a much smaller sample size than the default value (100000)
pmpd.spec.randomSeed = 31951 # initialize the random seed to generate reproducible results
pmpd.spec.overwriteRequested = True # overwrite existing old simulation files with the same names
pmpd.spec.outputFileName = "./out/gaussian" # output files prefix
# run the ParaDRAM sampler
pmpd.runSampler ( ndim = 1
, getLogFunc = getLogFunc
)
The realtime simulation progress information will printed on your Anaconda prompt window (NOT inside your Jupyter notebook). Once the simulation is finished, the ParaDRAM routine generates 5 output files, each of which contains information about certain aspects of the simulation, all available here to view.
pmpd.readSample()
sample = pmpd.sampleList[0] # shorten the component name to work with.
To quickly visualize the generated sample as a histogram, try,
sample.plot.histplot()
If the variable names are specified for the sampler before running the simulations, the sampler will automatically assign names to each variable. To change the x-label, for example, you can try the following,
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
We can also add target values to the plot (we can also reset the plot to the default settings, which we do here),
sample.plot.histplot.reset()
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
sample.plot.histplot.target() # add a line corresponding to the maxLogFunc (mode) of the sampled points.
By default, if no target value is specified, the mode of the sampled states will be used the target value. We can also add any other value of interest. For example, let's add the 1-sigma standard deviation lines to the plot,
sample.plot.histplot.reset()
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
avg = sample.df["SampleVariable1"].mean()
sample.plot.histplot.target( value = avg )
std = sample.df["SampleVariable1"].std()
sample.plot.histplot.target.axvline.kws.linestyle = "--"
sample.plot.histplot.target( value = avg - std )
sample.plot.histplot.target( value = avg + std )
In the above figure, we are now showing the mean and the 1-sigma distribution of the sampled points around the mean.
To make a trace-plot of the sample, try,
sample.plot.line()
To change the scale of the x-axis, try,
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")
By default, the color of the line in the trace-plot will represent the value returned by getLogFunc()
at the given sampled point. To turn the color off, you can instead try,
sample.plot.line.ccolumns
sample.plot.line.ccolumns = None
sample.plot.line()
There are many other properties of the plot that can be set or modified via the attributes of the pmpd.sampleList[0].plot.line
object. To see them all, see the documentation of the object,
sample.plot.line.helpme()