In [1]:

```
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
```

In [2]:

```
%matplotlib notebook
import paramonte as pm
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
sns.set()
```

Supposed we have observed a dataset comprised of $1000$ events with two attributes `variable1`

and `variable2`

in this data.csv file. Plotting these points would yield a red-colored bivariate distribution similar to the following figure,

The pale black points represent the missing data from our observational dataset. These are points that we could NOT observe (or detect) because of some instrumental bias and sample incompleteness as represented by the black line.

**Statement of problem**

Now, our goal is to form a hypothesis about this dataset, that is, a hypothesis about the distribution of the red points in the above plot. To make a correct assessment, we will have to also carefully consider the effects of the detection threshold (the black line) in our inference.

The first thing we can do is to obtain a better visualization of this dataset. **A Good visualization of data is essential to form a good hypothesis (model) for our problem**. The bivariate data in the above plot looks highly skewed to the right. Therefore, it makes sense to take the logarithm of this dataset in the hope that it can help us better understand the possible underlying generating distribution of this data,

Just by looking at the observed (red) data points, we can form a relatively good hypothesis about the distribution of the data: **If the detection threshold did not exist, the complete dataset (including the points in the black points) would have likely very well resembled a bivariate lognormal distribution (or a normal distribution on the logarithmic axes)**.

However, this dataset is affected by the detection threshold and we need to take a model of the detection threshold also into account. The logarithmic transformation makes it crystal-clear to us that the detection threshold is likely best modeled as a line.

Therefore, we form the following hypothesis,

**Hypothesis**: Our data comes from a bivariate Lognormal distribution subject to censorship by a sharp hard cutoff threshold in the form of a line. The proposition that the data is jointly lognormally distributed is mathematically equivalent to the proposition stating that `log(data)`

is normally distributed.

**Goal**: To **estimate the unknown parameters** of the bivariate Normal distribution which is hypothesized to well fit the bivariate distribution of the *log-transformed* data. These parameters include the **mean vector** of length two and the **covariance matrix** of the full bivariate Normal distribution ($\mu, \Sigma$), as well as the intercept and the slope ($c_0,c_1$) of the **sharp linear detection threshold** on our data set ($\eta$). A two-dimensional covariance matrix has 3 free parameters because of its symmetry. Therefore, our censored model has overall 7 unknown parameters which must be constrained.

**Methodology**: We will use the maximum likelihood approach and the **ParaDRAM** (**Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo**) sampler of the `paramonte`

library to estimate the 7 unknown parameters of our hypothesis.

First read the data using Pandas library, then log-transform data to make it look like a bivariate Normal distribution.

In [3]:

```
url = "https://github.com/cdslaborg/paramontex/raw/main/Python/Jupyter/regression_censored_bivariate_lognormal_data_paradram/data.csv"
df = pd.read_csv(url) # log-transform data to be fit by a Normal distribution
X = np.double(df.iloc[:,0])
Y = np.double(df.iloc[:,1])
logX = np.log(X)
logY = np.log(Y)
df.head()
```

Out[3]:

In [4]:

```
# plot the bivariate distribution of data
fig = plt.figure()
ax = fig.gca()
plt.scatter(X,Y,s=1,c="red")
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.title('The Bivariate Distribution of Data')
ax.set_xlim([0,10])
plt.grid(True)
plt.show()
# plot the bivariate distribution of log(data)
fig = plt.figure()
ax = fig.gca()
ax.set_xscale("log")
ax.set_yscale("log")
plt.scatter(X,Y,s=1,c="red")
plt.xlabel("Variable 1")
plt.ylabel("Variable 2")
plt.title('The Bivariate Distribution of Log-transformed Data')
plt.grid(True)
plt.show()
```

We will write a class that takes the `log(data)`

as input and has three methods, most importantly `getLogLike(param)`

. This method must take the set of the parameters of our censored model as a vector. **We want to explore the shape of this 7-dimensional likelihood function** to find its mode and its width along the axes, each of which represents one parameter of our model.

Here is the mathematical equation of the MultiVariate Normal (MVN) Probability Density Function (PDF), $\large\pi(\cdot)$,

$$ \large \pi ( \vec{x} ~|~ \mu, \Sigma, k = 2) = (2\pi)^{-\frac{k}{2}} ~ |\Sigma|^{-\frac{1}{2}} ~ \exp \bigg( -\frac{1}{2} (x-\mu)^T ~ \Sigma^{-1} ~ (x-\mu) \bigg) $$where $k=2$ represents the number of dimensions of our data set, and in this paricular problem,

$$ \large \begin{eqnarray} \mu ~&=&~ \big[ \mu_{x}, \mu_{y} \big] ~, \\ \Sigma ~&=&~ \begin{bmatrix} \sigma_{x} ~,~ \rho\sigma_{x}\sigma_{y} \\ \rho\sigma_{x}\sigma_{y} ~,~ \sigma_{y} \end{bmatrix} ~, \end{eqnarray} $$where $\sigma_{x}$ and $\sigma_{y}$ represent the standard deviations of log-transformations of `variable1`

and `variable2`

in our dataset, and $\rho$ represents the amount of corelation between the two variables.

However, our problem is more complicated than the above equation. The above equation gives the probability of obtaining a specific observation with a pair of attributes ($x,y$) from a *full* MVN distribution, whereas the MVN distribution in our problem is heavily censored on its right. We must therefore take this censorship into account by renormalizing the above equation such that the integral of the *censored* MVN PDF becomes unity again just as is the case for the above equation.

For a given censorship cutoff line ($\eta$) with intercept and slope ($c_0,c_1$), the intergal of the censored MVN PDF does not have a closed-form analytical solution (except in trivial cases, such as when the detection line is vertical or horizontal). Therefore, we will have to use numerical integration methods to intergate the censored MVN PDF over the entire area above the censorship line.

We will take the following strategy to integrate our censored 2D MVN PDF model: The integral of a univariate normal distribution is analytically known as a function of the **error function**. For a given value of `variable2`

along the y-axis and an upper ($\eta$) limit for the value of `variable1`

(which is dictated by the detection threshold), the differential probability of having an observation with a value of $x_2$ for `variable2`

is the probability of observing the value $y$ marginalized over all possible values of `variable1`

(subject to the detection threshold).

where ($\mu_{x|y},\sigma_{x|y}$) represent the mean and the standard deviation of the conditional distribution of `variable2`

given a particular value for `variable1`

,

Therefore, the full integral of the censored MVN PDF for a given set of parameters ($\mu,\Sigma,c_0,c_1$) can be computed as,

$$ \large I = \int^{-\infty}_{+\infty} ~ \pi (y ~|~ \mu, \Sigma, c_0, c_1) ~ dy ~, $$implying that we will have to multiply all PDF values by,

$$ \large A = \frac{1}{I} ~, $$to properly normalize the censored MVN PDF. The above intergal will have to computed numerically and we will do so via Reomberg's method of integration. In practice, much of the integrand is concentrated close to the mean. Therefore, integrating the integrad within a few standard deviations neighborhood of the mean of the integrand should be accurate enough for the purpose of this example.

All in all, the probability of observing each pair of ($x,y$) represented by the vector $\vec{x}$ in our data set, subject to a given detection threshold, can be computed via the following equation,

Now, with an equation at hand to compute the probabilities of individual data points for a given set of model parameters, we can go on to write the likelihood of observing the entire dataset of $N=1000$ observations for a given set of input parameters,

$$ \large \begin{eqnarray} \mathcal{L} ( \text{data} ~;~ \mu, \Sigma, c_0, c_1, k = 2) ~&\equiv&~ \pi ( \text{data} ~|~ \mu, \Sigma, c_0, c_1, k = 2) \\ ~&=&~ \prod_{i=1}^{1000} ~ \pi \bigg( (x_i,y_i) ~\bigg|~ \mu, \Sigma, c_0, c_1, k = 2 \bigg) \end{eqnarray} $$We can now implement the above likelihood function to find the most likely set of parameters for our hypothesis.

Since the mathematical objective functions (e.g., probability density functions and likelihood functions) can take extremely small or large values, we often work with their natural logarithms instead.

This is the reason behind the naming convention used in the ParaMonte library forthe user's objective functions:getLogFunc, implying thatthe user must provide a function that returns the natural logarithm of the target objective function.

See this Jupyter Notebook for an in-depth discussion of why we need to work with the logarithm of mathematical objective functions in optimization and sampling problems.

**Tip**

Before implementing the likelihood function, let's ask ourselves a few questions,

What is the possible range of the mean of a Normal distribution? $[-\infty, +\infty]$

What are the possible ranges of the intercept and slope of the censorship line in our problem? $[-\infty, +\infty]$, although practically, a range of slope and intercept values for which the observed data falls below the censirship line cannot be valid, because data cannot exist below the detection threshold.

What is the possible range of the standard deviation ($\sigma$) of a Normal distribution? By definition, $[0, +\infty]$

What is the possible range of a correlation coefficient ($\rho$) between two attributes of a dataset? By definition, $[-1, +1]$

Now, compare the range of the standard deviation of a Normal distribution with the ranges of the other parameters in the above.

Did you notice the asymmetric range of the standard deviation compared to the others?

This may not seem like a big deal, but wouldn't it be much nicer if the *theoretical* range of `std`

also looked symmetric like the other parameters? This is surely possible.

**Symmetrizing the range of the standard deviation values**

To make the range of a standard deviation ($\sigma$) parameter span from negative infinity to positive infinity, we can take the logarithm of $\sigma$ and instead try to find the best-fit $\log(\sigma)$ whose range is $[-\infty, +\infty]$.

**Symmetrizing the range of the correlation coefficient values**

To make the range of a correlation coefficient ($\rho$) parameter also span from negative infinity to positive infinity, we can use the Fisher transform of $\rho$ to find the best-fit $z\equiv\text{Fisher-transform}(\rho)$ whose range is $[-\infty, +\infty]$.

Here is an implementation of our censored Normal Hypothesis class,

In [5]:

```
from scipy.integrate import romberg
from scipy.stats import norm
class Normal():
def __init__(self, logX, logY):
if len(logX) != len(logY): return print("Error: dataset lengths not equivalent")
self.data = np.transpose(np.double([logX,logY]))
self.ndata = len(logX)
self.NVAR = 2
self.coef0 = self.NVAR * np.log( 1. / np.sqrt(2.*np.pi) )
self._rho = None
self._stdX = None
self._stdY = None
self._covMat = None
self._meanVec = None
self._stdRatio = None
self._stdXGivenY = None
self._cutoffIntercept = None
self._cutoffSlopeInverse = None
self._lowerLimY = None # for romberg intergation
self._upperLimY = None # for romberg intergation
def integrateXGivenY(self,yval):
avgXGivenY = self._meanVec[0] + self._stdRatio * self._rho * (yval - self._meanVec[1])
threshX = (yval - self._cutoffIntercept) * self._cutoffSlopeInverse
return norm.pdf(x = yval, loc = self._meanVec[1], scale = self._stdY) \
* norm.cdf(x = threshX, loc = avgXGivenY, scale = self._stdXGivenY)
def getNormFac(self, vec_func = False):
return romberg ( function = self.integrateXGivenY
, a = self._lowerLimY
, b = self._upperLimY
, vec_func = vec_func
, tol = 1.e-4
, divmax = 8
)
def getLogLike_slow(self, param):
"""
Return the natural logarithm of the likelihood of
observing the input data given the input parameters.
This function is a naive slow implementation of the
likelihood function which relies on looping over data.
As such, it is about 100 times slower than the equivalent
vectorized fast version of this method named getLogLike() below.
This function exists only for the purpose of illustrating the importance of
avoiding loops in Python and vectorizing your calculations as much as possible.
Order of variables in the input param vector:
param[0] = avg(x)
param[1] = avg(y)
param[2] = logStd(x)
param[3] = logStd(y)
param[4] = FirsherTrans(self._rho)
param[5] = cutoffSlope
param[6] = cutoffIntercept
"""
self._rho = np.tanh(param[4])
self._meanVec = np.double(param[0:2])
self._covMat = np.zeros([self.NVAR,self.NVAR])
self._stdX = np.exp(param[2])
self._stdY = np.exp(param[3])
self._covMat[0,0] = (self._stdX)**2
self._covMat[1,1] = (self._stdY)**2
self._covMat[1,0] = self._rho * self._stdX * self._stdY
self._covMat[0,1] = self._covMat[1,0]
self._stdXGivenY = np.sqrt( self._covMat[0,0] * (1-self._rho**2) )
self._stdRatio = self._stdX / self._stdY
self._cutoffSlopeInverse = 1 / param[5]
self._cutoffIntercept = param[6]
significance = 3 * self._stdY
self._lowerLimY = self._meanVec[1] - significance
self._upperLimY = self._meanVec[1] + significance
if np.linalg.det(self._covMat)==0:
sys.exit("The impossible happended. Covariance matrix not positive-definite.")
invCovMat = np.linalg.inv(self._covMat) # This is the inverse of the covariance matrix of the MVN distribution.
# The following is the log of the coefficient used in the definition of the MVN.
coef = self.coef0 + np.log( np.sqrt(np.linalg.det(invCovMat)) )
# the logarithm of objective function: log(MVN)
logLike = 0.
for obs in self.data:
if obs[1] < (param[5] * obs[0] + param[6]):
# It is impossible to observe any data point that falls below the
# censorship line. Therefore, if the input parameters result in a
# censorship line that is above even a single observation, then,
# return negative infinity to indicate impossible set of model parameters.
return -1.e100 # -np.inf
else:
normedPoint = self._meanVec - obs
logLike += coef - 0.5 * ( np.dot(normedPoint, np.matmul(invCovMat,normedPoint)) )
return logLike - np.log(self.getNormFac())*self.ndata
def getLogLike_fast(self, param):
"""
Return the natural logarithm of the likelihood of
observing the input data given the input parameters.
Order of variables in the input param vector:
param[0] = avg(x)
param[1] = avg(y)
param[2] = logStd(x)
param[3] = logStd(y)
param[4] = FirsherTrans(self._rho)
param[5] = cutoffSlope
param[6] = cutoffIntercept
"""
self._rho = np.tanh(param[4])
self._meanVec = np.double(param[0:2])
self._covMat = np.zeros([self.NVAR,self.NVAR])
self._stdX = np.exp(param[2])
self._stdY = np.exp(param[3])
self._covMat[0,0] = (self._stdX)**2
self._covMat[1,1] = (self._stdY)**2
self._covMat[1,0] = self._rho * self._stdX * self._stdY
self._covMat[0,1] = self._covMat[1,0]
self._stdXGivenY = np.sqrt( self._covMat[0,0] * (1-self._rho**2) )
self._stdRatio = self._stdX / self._stdY
self._cutoffSlopeInverse = 1 / param[5]
self._cutoffIntercept = param[6]
significance = 3 * self._stdY
self._lowerLimY = self._meanVec[1] - significance
self._upperLimY = self._meanVec[1] + significance
if np.linalg.det(self._covMat)==0:
sys.exit("The impossible happended. Covariance matrix not positive-definite.")
invCovMat = np.linalg.inv(self._covMat) # This is the inverse of the covariance matrix of the MVN distribution.
# The following is the log of the coefficient used in the definition of the MVN.
coef = self.coef0 + np.log( np.sqrt(np.linalg.det(invCovMat)) )
# the logarithm of objective function: log(MVN)
if np.any( normal.data[:,1] < (param[5] * normal.data[:,0] + param[6]) ):
# It is impossible to observe any data point that falls below the
# censorship line. Therefore, if the input parameters result in a
# censorship line that is above even a single observation, then,
# return negative infinity to indicate impossible set of model parameters.
return -1.e100 # -np.inf
else:
# normalize all observations with respect to the mean vector
normedPoint = np.subtract(normal._meanVec, normal.data).reshape(normal.ndata,normal.NVAR,1)
# Create a stack of multiplications of each normalized data with the inverse covariance matrix
invMatTimesNormedPoint = np.matmul(invCovMat,normedPoint)
# Compute the log(likelihood) adding all log(probabilities) minus
# the logarithm of the normalization constant of each computed probability.
logLike = normal.ndata * ( coef - np.log(self.getNormFac(vec_func = True)) ) \
- 0.5 * np.sum( np.multiply(normedPoint,invMatTimesNormedPoint) )
return logLike
```

**Tip: The importance of avoiding loops and vectorizing functions and calculations.**

Note the definitions of the two different instances of the same class method in the above code: `getLogFunc_slow()`

and `getLogFunc_fast()`

. The naive implementation of the likelihood function (`getLogFunc_slow()`

), which uses scalar function calls and loops over data, is on average **one order of magnitude** slower than the vectorized version of the same likelihood function (`getLogFunc_fast()`

). The latter fast version avoids looping over data and vectorizes all computations, including calls to the method `integrateXGivenY()`

.

To see the speed difference, we will time both methods below. But first, we will instantiate an object of `Normal()`

class,

In [6]:

```
normal = Normal(logX,logY)
```

In [7]:

```
# Define the parameter names and our best-guess values for them.
# This will be later used as input to the ParaDRAM sampler below.
bestGuess = { "AverageLogX" : 0
, "AverageLogY" : 0
, "LogStandardDeviationX" : 0
, "LogStandardDeviationY" : 0
, "FirsherTrans(rho)" : 0.7
, "cutoffSlope" : 1.5
, "cutoffIntercept" : 0
}
```

In [8]:

```
%%timeit
"""
Time the slow implmentation of the likelihhod function.
"""
normal.getLogLike_slow(list(bestGuess.values()))
```

In [9]:

```
%%timeit
"""
Time the fast vectorized implmentation of the likelihhod function.
"""
normal.getLogLike_fast(list(bestGuess.values()))
```

**The above benchmark results highlight again the importance of vectorization and avoiding loops for scientific computing with Python.**

In [10]:

```
# sample the log-likelihood function via the ParaDRAM MCMC sampler
import paramonte as pm
pmpd = pm.ParaDRAM() # create a ParaDRAM sampler object
pmpd.spec.randomSeed = 3571 # to make the simulation fully reproducible
pmpd.spec.chainSize = 10000 # change the number of sampled points from default 100,000 to a smaller value.
pmpd.spec.overwriteRequested = True # overwrite old existing simulation files with the same name
pmpd.spec.outputFileName = "./out/censored"
pmpd.spec.variableNameList = list(bestGuess.keys())
pmpd.spec.startPointVec = list(bestGuess.values())
pmpd.spec.targetAcceptanceRate = [0.1,0.3] # ensure the MCMC sampling efficiency does not become too large or too small.
pmpd.spec.sampleSize = 2000 # the final output sample size (optional)
# call the MCMC sampler
pmpd.runSampler( ndim = 7
, getLogFunc = normal.getLogLike_fast
)
```

**Done**. The simulation output files are now stored in the current working directory of Python. You can also see the generated output files on this GitHub page.

We will now follow the guidelines provided by the sampler to read the resulting MCMC chain, in compact format, from the output `*_chain.txt`

.

In [11]:

```
chain = pmpd.readChain(renabled = True)[0] # enable return. Pick the contents of the first chain file.
```

Then, we will ensure that there is no evidence for a lack-of-convergence. We will do so by visualizing the `AdaptationMeasure`

column in th output chain file, which represents the amount of adaptation of the proposal distribution of the MCMC sampler throughout the simulation.

In [12]:

```
chain.plot.scatter.scatter.kws.cmap = "winter"
chain.plot.scatter(ycolumns="AdaptationMeasure",ccolumns=[])
chain.plot.scatter.currentFig.axes.set_ylim([1.e-5,1])
chain.plot.scatter.currentFig.axes.set_yscale("log")
```

**The above graph looks quite reassuring**, since it clearly indicates that **the amount of adaptation of the proposal distribution decays toward zero very fast, like a power-law**. The initial sharp drop in the `AdaptationMeasure`

represent the simulation episode during which the proposal distribution of the MCM sampler adjusts its shape to the shape of the target likelihood function.

We will now read the resulting sample from the output `*_sample.txt`

file.

In [13]:

```
pmpd.readSample() # read the final refined sample from the MCMC simulation
sample = pmpd.sampleList[0]
```

In [14]:

```
# plot the sampled variables
for colname in pmpd.sampleList[0].df.columns:
sample.plot.line.ycolumns = colname
sample.plot.line()
sample.plot.line.currentFig.axes.set_xlabel("MCMC Count")
sample.plot.line.currentFig.axes.set_ylabel(colname)
sample.plot.line.savefig( fname = "./out/traceplot_" + colname )
# plot the histograms of the sampled parameters
for colname in pmpd.sampleList[0].df.columns:
sample.plot.histplot(xcolumns = colname)
sample.plot.histplot.currentFig.axes.set_xlabel(colname)
sample.plot.histplot.currentFig.axes.set_ylabel("MCMC Count")
sample.plot.histplot.savefig( fname = "./out/histogram_" + colname )
# report the average parameters
pmpd.sampleList[0].df.mean()
```