In [1]:

```
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
```

In [1]:

```
%matplotlib notebook
import paramonte as pm
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
sns.set()
```

**The logic of Monte Carlo sampling** is very simple:

Suppose we have a mathematical objective function defined on a `ndim`

-dimensional domain. Typically, this domain is defined on the space of real numbers. In general, understanding and visualizing the structure of such objective functions is an extremely difficult task, if not impossible. As a better alternative, you employ Monte Carlo techniques that can randomly explore the structure of the objective function and find the function's extrema.

In the following example below, an example of one of the simplest such objective functions, the **Multivariate Normal Distribution (MVN)**, is constructed and sampled using the ParaMonte library samplers, here, the **ParaDRAM** sampler (**Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler**).

Suppose we want to sample random points from a MultiVariate Normal (MVN) Probability Density Function (PDF). The PDF equation of MVN is the following,

$$
\large
\pi (x, \mu, \Sigma, k = 4) = (2\pi)^{-\frac{k}{2}} ~ |\Sigma|^{-\frac{1}{2}} ~
\exp
\bigg(
-\frac{1}{2} (x-\mu)^T ~ \Sigma^{-1} ~ (x-\mu)
\bigg)
$$

The following Python function `getLogFunc()`

returns the natural logarithm of the Probability Density Function of the multivariate Normal distribution.

Since the mathematical objective functions (e.g., probability density functions) can take extremely small or large values, we often work with their natural logarithms instead.

This is the reason behind the naming convention used in the ParaMonte library forthe user's objective functions:getLogFunc, implying thatthe user must provide a function that returns the natural logarithm of the target objective function.

See this Jupyter Notebook for an in-depth discussion of why we need to work with the logarithm of mathematical objective functions in optimization and sampling problems.

In [2]:

```
import numpy as np
NDIM = 4 # number of dimensions of the domain of the MVN PDF
MEAN = np.double([-10, 15., 20., 0.0]) # This is the mean of the MVN PDF.
COVMAT = np.double( [ [1.0,.45,-.3,0.0] # This is the covariance matrix of the MVN PDF.
, [.45,1.0,0.3,-.2]
, [-.3,0.3,1.0,0.6]
, [0.0,-.2,0.6,1.0]
] )
INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of the MVN distribution.
# The following is the log of the coefficient used in the definition of the MVN.
MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )
# the logarithm of objective function: log(MVN)
def getLogFunc(point):
"""
Return the logarithm of the MVN PDF.
"""
normedPoint = MEAN - point
return MVN_COEF - 0.5 * ( np.dot(normedPoint,np.matmul(INVCOV,normedPoint)) )
```

We will sample random points from the objective function by calling the **ParaDRAM** sampler (**Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler**) of the ParaMonte library.

To run the sampler in parallel, you will have to first save the MPI-enabled script as an external file. Visit the ParaMonte library's documentation website for more information, or see this parallel ParaDRAM simulation Jupyter notebook example.

In [3]:

```
import paramonte as pm
```

First we will check what version of the ParaMonte library your are using,

In [4]:

```
print( pm.version.kernel.get() )
print( pm.version.interface.get() )
```

We can also **dump** the ParaMonte library interface version,

In [5]:

```
print( pm.version.kernel.dump() )
print( pm.version.interface.dump() )
```

Checking for the new versions of the ParaMonte library is also easy and recommended,

In [6]:

```
pm.checkForUpdate()
```

We can also verify the existence of the essential components of the current installation of the ParaMonte Python library on the system,

In [7]:

```
pm.verify()
```

In [8]:

```
pmpd = pm.ParaDRAM()
```

The **simplest scenario** is to **run the simulation with the default specifications** that are appropriately determined by the ParaDRAM sampler. However, for the purpose of clarity reproducibility, we will specify a few simulation specs,

For a complete list of all ParaDRAM simulation specifications, visit the ParaDRAM simulation specifications webpage.

We will store all output files in a directory with the same name as this Jupyter Notebook's name and with the same prefix.

By default, all ParaDRAM simulation output files are named properly (see this page for a review of ParaDRAM simulation output files).

In general, it is a good habit to specify an output file prefix for any ParaDRAM simulation. This way, the restart of an incomplete simulation, if it happens for some reasons, would be seamless and doable by just rerunning the simulation, without any change of any sort to any parts of the codes or the simulation.

In [9]:

```
pmpd.spec.outputFileName = "./out/mvn_serial"
```

By default, the ParaMonte samplers do not overwrite old existing simulation files for obvious reasons. For this example, however, we will let the sampler overwrite the simulation output files if they already exist at the specified path,

In [10]:

```
pmpd.spec.overwriteRequested = True
```

We will also initialize the random seed to a fixed number to ensure the reproducibility of the simulation results in the future,

In [11]:

```
pmpd.spec.randomSeed = 3751
```

Not so important, but we will also specify a chainSize here, 30000, the number of **unique points** to be sampled from the objective function. This number is much smaller than the default 100,000 unique sample points of the sampler.

In [12]:

```
pmpd.spec.chainSize = 30000
```

For the sake of illustration, we will also request an ASCII-format restart output file,

In [13]:

```
pmpd.spec.restartFileFormat = "ascii"
```

Note that none of the above specification settings are necessary, but is in general a good habit to define them prior to the simulation.

We now call the ParaDRAM sampler routine to run the sampling simulation,

In [14]:

```
# run the ParaDRAM sampler
pmpd.runSampler ( ndim = 4
, getLogFunc = getLogFunc
)
```

This will print the realtime simulation progress information on your **Anaconda prompt window** (not inside your Jupyter notebook). Once the simulation is finished, the ParaDRAM routine generates 5 output files, each of which contains information about certain aspects of the simulation,

mvn_serial_process_1_report.txt

This file contains reports about all aspects of the simulation, from the very beginning of the simulation to the very last step, including the postprocessing of the results and final sample refinement.mvn_serial_process_1_progress.txt

This file contains dynamic realtime information about the simulation progress. The frequency of the progress report to this file depends on the value of the simulation specification`progressReportPeriod`

.mvn_serial_process_1_sample.txt

This file contains the final fully-refined decorrelated i.i.d. random sample from the objective function.mvn_serial_process_1_chain.txt

This file contains the Markov chain, generally in either binary or condensed (compact) ASCII format to improve the storage efficiency of the chain on your computer.

Now, we can read the generated output sample, contained in the file suffixed with `*_sample.txt`

.

In [15]:

```
pmpd.readSample() # generates pmpd.sampleList
sample = pmpd.sampleList[0]
```

By default, the sampler generates a highly refined decorrelated final sample. Essentially, **the ParaDRAM sampler refining the Markov chain for as long as there is any residual auto-correlation in any of the individual variables of the Markov chain that has been sampled**. This is an extremely strong and conservative refinement strategy (and it is done so in the ParaDRAM algorithm for a good reason: to **ensure independent and identically distributed (i.i.d.) samples from the objective function**.

This extreme refinement can be relaxed and controlled by the user, by setting the following ParaDRAM simulation specification variable prior to the simulation run,

In [16]:

```
pmpd.spec.sampleRefinementCount = 1
```

If you rerun the simulation with this specification set and, you will notice about a **three-fold** increase in the final sample size. Note that the above specification has to be set **before** running the simulation in the above if you really want to use it (also, do not forget to specify a new output file prefix for the new simulation because simulations cannot be overwritten). A value of `1`

will cause the ParaDRAM sampler to refine the Markov chain for only one round, which what most people in the MCMC community do.

To quickly visualize the generated sample as a histogram, try,

In [17]:

```
sample.plot.histplot()
```

How about adding a target value to this histogram plot? By default, if no `value`

is specified for the target object, it will add the state corresponding to the mode of objective funciton to the plot, as inferred from the sample file,

In [18]:

```
sample.plot.histplot.reset()
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
sample.plot.histplot.target() # add a line corresponding to the maxLogFunc (mode) of the sampled points.
```

However, assiging other values is also possible. For example, let's say that we want to add the mean and 1-sigma limits to the plot, instead of the single mode,

In [19]:

```
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
avg = sample.df.SampleVariable1.mean()
sample.plot.histplot.target( value = avg )
std = sample.df.SampleVariable1.std()
sample.plot.histplot.target.axvline.kws.linestyle = "--"
sample.plot.histplot.target( value = avg - std )
sample.plot.histplot.target( value = avg + std )
```

We can also plot all of the variables at the same time in a single figure,

In [20]:

```
sample.df.head()
```

Out[20]:

In [21]:

```
sample.plot.histplot( xcolumns = [1,2,3,4] )
```

Perhaps, we would also want to add legend and change the x-axis title,

In [22]:

```
sample.plot.histplot.reset("hard") # reseting to the default is not necessary, but does not hurt either.
sample.plot.histplot.legend.enabled = True
sample.plot.histplot( xcolumns = [1,2,3,4] )
sample.plot.histplot.currentFig.axes.set_xlabel("Multivariate Normal Variables")
```

Out[22]:

To make a trace-plot of the sample, try,

In [23]:

```
sample.plot.line()
```

To change the scale of the x-axis, try,

In [24]:

```
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")
```

By default, the color of the line in the trace-plot will represent the value returned by `getLogFunc()`

at the given sampled point.

In [26]:

```
sample.plot.line.ccolumns
```

Out[26]:

To turn the color off, we can instead try,

In [27]:

```
sample.plot.line.ccolumns = None
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")
```