In [1]:

```
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
```

In [2]:

```
%matplotlib notebook
import paramonte as pm
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
sns.set()
```

Suppose we have collected a dataset comprised of following $(x_i,y_i), \{i = 1, \ldots, 6\}$ pair observations,

\begin{eqnarray} X &=& [\, 0.5,\, 2.4,\, 3.2,\, 4.9,\, 6.5,\, 7.8\, ]\, , \\ Y &=& [\, 0.8,\, 9.3,\, 37.9,\, 68.2,\, 155,\, 198\, ]\, . \end{eqnarray}and we would like to model the dependence of $Y$ on $X$. First thing to do in such problems is to **explore the data and visualize it** in search of any potential patterns, as we do below.

In [3]:

```
X = np.double([0.5, 2.4, 3.2, 4.9, 6.5, 7.8])
Y = np.double([0.8, 9.3, 37.9, 68.2, 155, 198])
fig = plt.figure ( figsize = (4.5,4)
, dpi = 200
)
ax = fig.add_subplot(1,1,1)
ax.scatter( X
, Y
, color = "red"
, s = 5 # symbol size
)
```

Out[3]:

Obviously, the relationship between $X$ and $Y$ in this dataset is non-linear. A fairly experienced Data Scientist could immediately guess the underlying relatioship in this dataset: **power-law**, which can be readily confirmed by checking the $X-Y$ relationship on a log-log plot. This works because if,

then, $$ {\large \log(y) = \log(\alpha) + \beta \log(x) \, .} $$

In [4]:

```
fig = plt.figure(figsize = (4.5,4), dpi = 200)
ax = fig.add_subplot(1,1,1)
ax.scatter(np.log(X), np.log(Y), color = "red", s = 5)
```

Out[4]:

However, a simple line does not perfectly fit the (log-transformed) data either. Therefore, we must also take into account the **inherent variability** in our data around the linear fit.

Although not always correct, the simplest most popular assumption about the noise in data is that the $Y$ is distributed according to a Normal distribution around the best-fit line to our log-transformed data,

$$ {\large \pi(\log(y) \;|\; \log(x),\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \exp\bigg[\;\frac{1}{2}\bigg(\frac{\log(y)\; \mu}{\sigma}\bigg)^2\bigg] } \;, $$The unknown parameters of this Normal distribution, the mean ($\mu$) and the standard deviation ($\sigma$) would have to be then determined by fitting. Fortunately, the mean of the normal distribution can be simply taken to be the prediction of our best linear fit. Therefore, the mean is a function of the $x$ value and the slope and intercept of our linear fit to log-transformed data. However, the standard deviation would be to fit and constrained separately.

Now that we have constructed our model and the noise, we can proceed to use the principle of Maximum Likelihood to constrain the unknown parameters of the model using the Markov Chain Monte Carlo approach that is available from the ParaMonte library via the `ParaDRAM()`

sampler class.

To summarize, the unknown parameters are the slope and intercept of the linear fit and the standard deviation (`sigma`

) of the Normal distribution.

The probability of observing a single data point is given by the above Normal Probability Density Function (PDF). Therefore, the probability of observing all of our dataset, assuming they are independent and identically distributed (i.i.d.), is simply the multiplication of all of the individual probabilities,

\begin{eqnarray} \large \mathcal{L}(\log(Y),\log(X) \;|\; \text{intercept, slope, }\sigma) &=& \large \prod_{i=1}^{N=6}\; \pi(\log(y_i)\;|\;\log(x_i),\mu,\sigma) \; , \\ \large \mu(\log(x),\text{intercept, slope, }\sigma) &=& \large \text{intercept} + \text{slope} \times \log(x) \;, \end{eqnarray}The likelihood is often difficult as the likelihood value is often very tiny causing underflow in computers. Therefore, we often prefer to work with the log-transformation of the likelihood which does not suffer from the same problem,

\begin{eqnarray} {\large \log\mathcal{L}(\log(Y),\log(X) \;|\; \text{intercept, slope, }\sigma) = \sum_{i=1}^{N=6} \; \log\pi(\log(y_i)\;|\;\log(x_i),\mu,\sigma) \;, } \end{eqnarray}We now go on to implement this log-likelihood as a Python function. However, before doing so, we shall take care of an extremely important issue: **The possible range of values of the parameters**.

If you pay attention to the parameters, you will notice that the intercept and slope can in general be any real value from negative infinity to positive infinity. However, the third parameter, the standard deviation parameter of the Normal distribution is a strictly positive-valued number. This range-asymmetry can cause instabilities in our Markov Chain sampler.

**How can we solve this and symmetrize the range of sigma?** The answer is simple: We will sample $\log(\sigma)$ instead of sigma itself ($\sigma$). So our third sampling parameter will be `logSigma`

instead of `sigma`

, which has a symmetric possible range from negative infinity to positive infinity. Then, inside our likelihood function, we will convert `logSigma`

back to `sigma`

.

In [12]:

```
from scipy.stats import norm
logX = np.log(X)
logY = np.log(Y)
def getLogLike(param):
"""
Return the natural logarithm of the likelihood of observing the (X,Y) dataset defined
globally outside of this function given the input parameter vector ``param``.
Parameters
==========
param
A numpy 64-bit real vector containing the parameters of the model in the following order:
intercept, slope, log(sigma) (standard deviation of the Normal distribution).
Returns
=======
logLike
The natural logarithm of the likelihood of observing the dataset (X,Y) given ``param``.
"""
# Compute the expected value of y, given each value of x.
# This is the mean of the Normal distribution given the x values.
mean = param[0] + param[1] * logX
# Compute the log-PDFs of oberserving each data point given the inpur parameters.
logProbDensities = norm.logpdf(logY, loc = mean, scale = np.exp(param[2])) # Note that param[2] is logSigma.
# Compute and return the log-likliehood
return np.sum(logProbDensities)
```

Now let's test this likelihood function for a random input test value for the parameters to ensure it works fine. We will pick our best guess (simply by visual inspection of the data log-log plot).

In [13]:

```
getLogLike(param = np.double([0.8,2,0.1]))
```

Out[13]:

We can now proceed to the set up a ParaDRAM Markov Chain Monte Carlo sampler to constrain the three unknown parameters of the model.

In [14]:

```
import paramonte as pm
pmpd = pm.ParaDRAM() # instantiate an object of class ParaDRAM sampler Paradram() or paradram() would also work.
```

The following settings are optional but recommended.

In [15]:

```
pmpd.spec.overwriteRequested = True # overwrite the existing output files just in case they already exist.
pmpd.spec.outputFileName = "./out/regression_powerlaw" # specify the output file prefixes.
pmpd.spec.randomSeed = 12345 # set the random seed for the sake of reproducibity.
pmpd.spec.variableNameList = ["intercept", "slope", "logSigma"] # set the output names of the parameters.
pmpd.spec.chainSize = 20000 # set the number of uniquely sampled points from the likelihood function.
```

Since we did not set the starting point of the MCMC sampler, the default (origin) will be used as the starting point of our search, which is not too bad. Finally, we run the sampler,

In [16]:

```
pmpd.runSampler( ndim = 3 # the number of parameters
, getLogFunc = getLogLike # the objective function to sample points from.
)
```

Done. We can read the generated sample to analyze and visualize the results. However, before doing so, we will make sure there is no lack of convergence in the MCMC chain by inspecting the **adaptation measure** of the chain that is available in the output chain file.

In [17]:

```
chain = pmpd.readChain(renabled = True)[0]
```

In [18]:

```
chain.plot.scatter.scatter.kws.cmap = "winter" # change the color-mapping from the default "autumn"
chain.plot.scatter( ycolumns = "AdaptationMeasure" # choose the AdaptationMeasure column of the output chain file to plot
, ccolumns = [] # set the color of the points to empty from the default logLikelihood value.
)
chain.plot.scatter.currentFig.axes.set_ylim([1.e-5,1])
chain.plot.scatter.currentFig.axes.set_yscale("log")
```

The power-law decay of the adaptation measure tells us everything has gone well with our simulation. If there were to exist any signs of a lack of convergence, we would not see this power-law decaying behavior. This power-law decay implies that the proposal adaptation decayed and diminished consistently throughout the entire simulation, as we wished.

We can now visualize the resulting refined sample by reading and plotting it from the output sample file.

In [19]:

```
sample = pmpd.readSample(renabled = True)[0]
```

In [20]:

```
# plot the sampled variables
for colname in sample.df.columns:
sample.plot.line.ycolumns = colname
sample.plot.line()
sample.plot.line.currentFig.axes.set_xlabel("MCMC Count")
sample.plot.line.currentFig.axes.set_ylabel(colname)
sample.plot.line.savefig( fname = "./out/traceplot_" + colname )
# plot the histograms of the sampled parameters
for colname in sample.df.columns:
sample.plot.histplot(xcolumns = colname)
sample.plot.histplot.currentFig.axes.set_xlabel(colname)
sample.plot.histplot.currentFig.axes.set_ylabel("MCMC Count")
sample.plot.histplot.savefig( fname = "./out/histogram_" + colname )
```