In [1]:

```
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
```

In [2]:

```
%matplotlib notebook
import paramonte as pm
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
sns.set()
```

There are **two parallelism paradigms** currently implemented **in the ParaDRAM sampler**:

- The
**single-chain parallelism**, in which only a single Markov chain is generated, but all processors contribute to the construction of this chain. - The
**multi-chain parallelism**, in which each processor creates its Markov chain separately from the rest of the processors. However, at the end of the simulation, all processors communicate with each other to compute the probability that convergence to the target density has occurred and that all processors have sampled the same region of high probability in the domain of the objective function.

- The
**single-chain parallelism**becomes very useful for large-scale problems that are highly computationally demanding. - The
**multi-chain parallelism**is useful when you suspect that the target objective function that has to be sampled is multi-modal. In such cases, the multi-chain parallelism could provide further evidence on whether convergence to single-mode or multi-modal target density function has occurred or not.

In either parallelism case, the ParaMonte library currently uses the MPI library for inter-process communications. As such, if you want to run a ParaDRAM simulation in parallel, you will have to first save your Python scripts in external Python files and then call them from the command line via the MPI launcher application.

To see how this can be done, consider the simple toy problem of sampling a 4-dimensional Multivariate Normal (MVN) distribution as described in this jupyter notebook.

In [3]:

```
import paramonte as pm
pm.verify()
```

In [4]:

```
pm.checkForUpdate() # check for new versions of the library
```

We will save our parallel script in a file with the same name as this Jupyter Notebook's name,

In [5]:

```
with open("./sampling_multivariate_normal_distribution_via_paradram_parallel_singleChain.py","w") as file:
contents = """
import numpy as np
NDIM = 4 # number of dimensions of the domain of the MVN PDF
MEAN = np.double([-10, 15., 20., 0.0]) # This is the mean of the MVN PDF.
COVMAT = np.double( [ [1.0,.45,-.3,0.0] # This is the covariance matrix of the MVN PDF.
, [.45,1.0,0.3,-.2]
, [-.3,0.3,1.0,0.6]
, [0.0,-.2,0.6,1.0]
] )
INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of the MVN distribution.
# The following is the log of the coefficient used in the definition of the MVN.
MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )
# the logarithm of objective function: log(MVN)
def getLogFunc(point):
'''
Return the logarithm of the MVN PDF.
'''
normedPoint = MEAN - point
return MVN_COEF - 0.5 * ( np.dot(normedPoint,np.matmul(INVCOV,normedPoint)) )
import paramonte as pm
pmpd = pm.ParaDRAM() # define a ParaMonte sampler instance
pmpd.mpiEnabled = True # This is essential as it enables the invocation of the MPI-parallelized ParaDRAM routines.
pmpd.spec.overwriteRequested = True # overwrite existing output files if needed
pmpd.spec.randomSeed = 3751 # initialize the random seed to generate reproducible results.
pmpd.spec.outputFileName = "./out/mvn_parallel_singleChain"
pmpd.spec.progressReportPeriod = 20000
pmpd.spec.chainSize = 30000 # the default 100,000 unique points is too large for this simple example.
# call the ParaDRAM sampler
pmpd.runSampler ( ndim = 4
, getLogFunc = getLogFunc
)
"""
file.write(contents)
```

Here is the saved output MPI-parallelized Python script.

Notethe only difference in the above parallel script with the serial version, which is the extra Python statement`pmpd.mpiEnabled = True`

. This flag tells the ParaDRAM sampler initiate the simulation in parallel and silence all output messages that would otherwise be printed by all processes. .

**IMPORTANT**: At this point, we have assumed that you already have an MPI runtime library installed on your system. We highly recommend the use of the Intel MPI library on your system if it is Windows or Linux, and Open-MPI if it is macOS. You can run `pm.verify()`

on your python command line, just as described in the Jupyter notebook for the serial sampling of the MVN distribution, to verify the existence of the MPI library on your system.

We will now run this code in parallel on 3 processors. We will invoke the `mpiexe`

launcher to run the code in parallel, however, depending on your system, your platform, or the supercomputer on which you are running this code, you may need a different MPI launcher (e.g., `ibrun`

, `mpirun`

, ...). In the following, we will assume that you will be using the Intel MPI library if your operating system is Windows (as implied by the flag `-localonly`

).

Now, we run the MPI-enabled Python script in parallel on three cores, **on the terminal** (not in the Python session). On Linux or macOS, we can try the following command,

```
mpiexec -n 3 python main_mpi_singleChain.py
```

On windows, if you are using the Intel MPI library, we can try the following,

```
mpiexec -localonly -n 3 python main_mpi_singleChain.py
```

Otherwise, the same syntax and flags as used in the cases of Linux and macOS should work fine. To understand the meaning of the extra `-localonly`

flag, see the ParaMonte library documentation page.

The following command combines the above two commands in a single line so that it works, whether you are using a Windows machine or Linux/macOS,

In [6]:

```
!ls && \
mpiexec -n 3 python sampling_multivariate_normal_distribution_via_paradram_parallel_singleChain.py || \
mpiexec -localonly -n 3 python sampling_multivariate_normal_distribution_via_paradram_parallel_singleChain.py
```

The sampler has now generated 5 output files that are accessible here, all prefixed with `mvn_parallel_singleChain_*`

. In particular, the simulation report file contains a lot of interesting information about the performance of the parallel simulation. We can process these files in the same way we did for the serial version of sampling the MVN PDF via the ParaDRAM sampler. For example, to parse the contents of the report file, we can try,

In [7]:

```
import paramonte as pm
print(pm.version.interface.get())
print(pm.version.kernel.get())
pmpd = pm.ParaDRAM()
pmpd.readReport("./out/mvn_parallel_singleChain")
report = pmpd.reportList[0]
```

There are a lot of detailed information about different aspects of the parallel simulation in this file. Here is a glance through some of the information extracted from the file,

In [8]:

```
print(report.stats.time.perFuncCall.value)
print(report.stats.time.perFuncCall.description)
```

In [9]:

```
print(report.stats.time.perInterProcessCommunication.value)
print(report.stats.time.perInterProcessCommunication.description)
```

In [10]:

```
print(report.stats.parallelism.current.numProcess.value)
print(report.stats.parallelism.current.numProcess.description)
```

In [11]:

```
print(report.stats.parallelism.current.speedup.value)
print(report.stats.parallelism.current.speedup.description)
```

In [12]:

```
print(report.stats.parallelism.optimal.current.speedup.value)
print(report.stats.parallelism.optimal.current.speedup.description)
```

In [13]:

```
print(report.stats.parallelism.optimal.absolute.speedup.value)
print(report.stats.parallelism.optimal.absolute.speedup.description)
```

In [14]:

```
print(report.stats.parallelism.processContribution.value)
print(report.stats.parallelism.processContribution.description)
```

The ParaDRAM sampler also automatically computes the strong scaling behavior of the parallel simulation under the current and absolutely optimal simulation conditions. For example, we can plot these scaling results like the following,

In [15]:

```
print(report.stats.parallelism.optimal.current.scaling.strong.speedup.value)
print(report.stats.parallelism.optimal.current.scaling.strong.speedup.description)
```

In [16]:

```
print(report.stats.parallelism.optimal.absolute.scaling.strong.speedup.value)
print(report.stats.parallelism.optimal.absolute.scaling.strong.speedup.description)
```

In [17]:

```
%matplotlib notebook
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
fig = plt.figure()
ax = fig.gca()
ax.plot(report.stats.parallelism.optimal.current.scaling.strong.speedup.value)
ax.plot(report.stats.parallelism.optimal.absolute.scaling.strong.speedup.value)
ax.set_ylabel("Speedup")
ax.set_xlabel("Number of Processors")
ax.legend(labels = ["current","absolutely-optimal"])
```

Out[17]:

As we see in the above plot and the information extracted from the report file, the **Estimated maximum speedup gained via singleChain parallelization model compared to serial mode**, was only moderate (less than twice). This is partly because the example objective function here is too easy to compute and partly because this simulation was performed on a decent fast quad-core processor.

But, more importantly, note the **Predicted absolute optimal maximum speedup** gained via singleChain parallelization model, **under any MCMC sampling efficiency**, which tells us that no matter how you configure this simulation, the speedup gained by running this simulation in parallel can be **at most** a factor of ~2 better than the performance of the serial run of the same problem with the same simulation specifications, **regardless of how many CPU cores you may use for the parallel simulation**.

**The ability of the sampler to give us such detailed efficiency reports is remarkable as it can help us set up our parallel simulations more reasonably and optimally, without wasting any extra computational resources with no efficiency gain**.

When you are working with expensive large-scale simulations, it is, therefore, a good idea to run some tests of your simulation and check the output of the report file to find the predicted optimal number of physical cores for the parallel simulation and then, request the same number of cores as predicted when invoking the MPI launcher.

Similar to the report file, the rest of the simulation data can be also parsed and analyzed. However, such tasks are identical to the case of the serial simulation and we, therefore, suffice to direct the reader of this notebook to the Jupyter notebook for the serial version of this simulation problem.

There is another mode of parallelization, the **multiChain** or **multi-chain** mode, by which the ParaDRAM sampler can sample points from the objective function. In this mode, the sampler will generate multiple chains, each corresponding to one physical core on the computer. Each of these chains will independently explore the domain of the objective function.

To make the exploration even more interesting and robust (i.e., to ensure convergence to the target objective function by all independent chains), we can also let the initial starting point of the MCMC sampler to be chosen at random. To do so, we will have to also specify a domain from which the initial random start points will be sampled, otherwise, the default domain extends from negative infinity to positive infinity, which is problematic for computer simulations,

In [18]:

```
with open("./sampling_multivariate_normal_distribution_via_paradram_parallel_multiChain.py","w") as file:
contents = """
import numpy as np
NDIM = 4 # number of dimensions of the domain of the MVN PDF
MEAN = np.double([-10, 15., 20., 0.0]) # This is the mean of the MVN PDF.
COVMAT = np.double( [ [1.0,.45,-.3,0.0] # This is the covariance matrix of the MVN PDF.
, [.45,1.0,0.3,-.2]
, [-.3,0.3,1.0,0.6]
, [0.0,-.2,0.6,1.0]
] )
INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of the MVN distribution.
# The following is the log of the coefficient used in the definition of the MVN.
MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )
# the logarithm of objective function: log(MVN)
def getLogFunc(point):
'''
Return the logarithm of the MVN PDF.
'''
normedPoint = MEAN - point
return MVN_COEF - 0.5 * ( np.dot(normedPoint,np.matmul(INVCOV,normedPoint)) )
import paramonte as pm
pmpd = pm.ParaDRAM() # define a ParaMonte sampler instance
pmpd.mpiEnabled = True # This is essential as it enables the invocation of the MPI-parallelized ParaDRAM routines.
pmpd.spec.overwriteRequested = True # overwrite existing output files if needed
pmpd.spec.randomSeed = 3751 # initialize the random seed to generate reproducible results.
pmpd.spec.outputFileName = "./out/mvn_parallel_multiChain"
pmpd.spec.progressReportPeriod = 20000
pmpd.spec.chainSize = 30000 # the default 100,000 unique points is too large for this simple example, so set it to 30000.
# set up a random initial starting point for each of the independent MCMC chains,
# by defining the domain of the random start points. The following defines the
# boundaries of the NDIM(=four)-dimensional hypercube from within which the
# random initial start points will be drawn by the sampler,
pmpd.spec.randomStartPointRequested = True # This is essential, otherwise, random initialization won't happen
pmpd.spec.randomStartPointDomainLowerLimitVec = NDIM * [-25]
pmpd.spec.randomStartPointDomainUpperLimitVec = NDIM * [+25]
# set the parallelization model to multichain
pmpd.spec.parallelizationModel = "multi chain" # the value is case and white-space insensitive
# call the ParaDRAM sampler
pmpd.runSampler ( ndim = 4
, getLogFunc = getLogFunc
)
"""
file.write(contents)
```

Now, we run the MPI-enabled Python script in parallel on three cores, **on the terminal** (not in the Python session),

In [19]:

```
!ls && \
mpiexec -n 3 python sampling_multivariate_normal_distribution_via_paradram_parallel_multiChain.py || \
mpiexec -localonly -n 3 python sampling_multivariate_normal_distribution_via_paradram_parallel_multiChain.py
```

Unlike the the other modes of simulation, wether serial or `singleChain`

-parallel, the `multiChain`

-parallel ParadRAM simulation generates `5 * number_of_cores`

output files (prefixed by `mvn_parallel_multiChain*`

) on the system, separated from each other by their processor IDs (starting from 1).

By looking at the end of any of the output `_report.txt`

files, we will notice that the Kolmogorov-Smirnov (KS) probabilities of the similarities of pairs of these independent samples from indepedent MCMC chains is generally quite high, indicating the high level of similarities between the independent samples obtained from the independent MCMC chains. **This means that there is no evidence of the lack of convergence of the MCMC samples to the target objective function**. We know this for sure in this particular example, because the structure of the objective function is known to us. In other problems, however, this may never be known, in other words, *we can just hope that the convergence has occurred*.

Here is a KS-probability-table excerpt from the first processor's output `*_report.txt`

file,

In [20]:

```
import paramonte as pm
print(pm.version.interface.get())
print(pm.version.kernel.get())
pmpd = pm.ParaDRAM()
report = pmpd.readReport( "./out/mvn_parallel_multiChain"
, renabled = True
)[0] # keep only the first report file's contents
```

In [21]:

```
print(report.stats.chain.refined.kstest.prob.value)
print(report.stats.chain.refined.kstest.prob.description)
```

We can also read and visualize the results of these parallel runs just as before,

In [22]:

```
%matplotlib notebook
import paramonte as pm
pmpd = pm.ParaDRAM()
markovList = pmpd.readMarkovChain( "./out/mvn_parallel_multiChain_*"
, renabled = True
)
```

In [24]:

```
markovList[0].plot.line()
```

Let's take the log of the x-axis for better visualization,

In [25]:

```
markovList[0].plot.line()
markovList[0].plot.line.currentFig.axes.set_xscale("log")
```

We can compare this plot with the resulting Markov chain from, say, processor #3,

In [26]:

```
markovList[2].plot.line()
markovList[2].plot.line.currentFig.axes.set_xscale("log")
```

Or perhaps, compare all of the independent (compact) chains (of uniquely sampled points) on the same plot,

In [27]:

```
import matplotlib.pyplot as plt
import paramonte as pm
pmpd = pm.ParaDRAM()
pmpd.readReport("./out/mvn_parallel_multiChain")
pmpd.readChain("./out/mvn_parallel_multiChain")
plt.figure() # one figure for all plots
for process in range(pmpd.reportList[0].stats.parallelism.current.numProcess.value):
pmpd.chainList[process].plot.line.figure.enabled = False # all plots appear in the same plot
pmpd.chainList[process].plot.line.ccolumns = None # turn off color-mapping
pmpd.chainList[process].plot.line()
pmpd.chainList[process].plot.line.currentFig.axes.set_xscale("log")
```

Impressively, all independent chains, even though all start at random locations in the domain of the objective function, end up at the same sole peak of the objective function, as expected and illustrated in the above figure.

There are many more functionalities and features of the ParaMonte library that were neither explored nor mentioned in this example Jupyter notebook. You can explore them by checking the existing components of each attribute of the ParaDRAM sampler class and by visiting the ParaMonte library's documentation website.

In [ ]:

```
```