Why do we need to work with the logarithm of the mathematical objective functions?

Short Answer:

In optimization and Monte Carlo sampling problems, since the mathematical objective functions (e.g., probability density functions) can take extremely small or large values, we often work with their natural logarithms instead. This is the reason behind the naming convention used in the ParaMonte library for the user's objective functions: getLogFunc, implying that the user must provide a function that returns the natural logarithm of the target objective function.

Long Answer:

Why do we often prefer to work with the logarithm of functions in optimization and Monte Carlo sampling problems?

To find out, consider the following simple mathematical objective function getLogFunc_bad() that implements a 4-dimensional Standard MultiVariate Normal (MVN) distribution, whose generic Probability Density Function is given by the following equation (see also this Wikipedia article),

$$ \large \pi (x, \mu, \Sigma, k = 4) = (2\pi)^{-\frac{k}{2}} ~ |\Sigma|^{-\frac{1}{2}} ~ \exp \bigg( -\frac{1}{2} (x-\mu)^T ~ \Sigma^{-1} ~ (x-\mu) \bigg) $$
In [13]:
import numpy as np
from scipy.stats import multivariate_normal

NDIM = 4 # the number of dimensions of the domain of the objective function: MVN

mvn = multivariate_normal   ( mean =  np.zeros(4) # This is the mean of the MVN distribution.
                            , cov = np.eye(4)     # This is the covariance matrix of the MVN distribution.
                            )

def getLogFunc_bad(point): return np.log(mvn.pdf(point))

Suppose we want to sample this MVN Probability Density Function (PDF) via the ParaMonte Monte Carlo simulation library. To do so, the ParaMonte library sampler routines will make repeated calls to this objective function that we have implmented in the above.

However, notice in the above implementation that we have suffixed the objective function with _bad. There are several good important reasons for such naming:

  • The evaluation of this function involves a log(exp()) term in its definition. If the origin of the exp() term is not clear to you, take a look at the definition of the MVN distribution in the equation provided in the above. This is computationally very expensive and in general, is considered a bad implementation.
  • The evaluation of the function as implemented in the above requires an inverse-covariance matrix computation on each call made to getLogFunc_bad(). This is completely redundant as the value of the covariance matrix -- and therefore, its inverse -- does not change throughout the simulation. Consequently, a lot of computational resources are wasted for nothing.
  • The above implementation of the MVN distribution is quite prone to numerical overflow and underflow, which could cause the simulations to crash at runtime. For example, when the input value for point happens to be too far from the mean of the MVN distribution, it is likely for the resulting MVN PDF value to be so small that the computer rounds it to zero. Then np.log(0.0) in getLogFunc_bad() becomes undefined and the simulation crashes. That would only be the best-case scenario in which the crash will alert you about the problem. But more frequently, your code may not even complain that it is working with infinities and not doing any useful work.

It is, therefore, important to implement a numerically efficient, fault-tolerant MVN PDF calculator that resolves all of the above concerns. This is possible if we take a second look at the equation for the MVN PDF in the above and try directly implement its logarithm as our computational objective function,

In [15]:
import numpy as np

NDIM = 4              # the number of dimensions of the domain of the objective function: MVN
MEAN = np.zeros(NDIM) # This is the mean of the MVN distribution.
COVMAT = np.eye(NDIM) # This is the covariance matrix of the MVN distribution.
INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of MVN.

# This is the log of the coefficient used in the definition of the MVN.

MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )

def getLogFunc(point): 
    """
    Return the logarithm of the MVN PDF.
    """
    normedPoint = MEAN - point
    return MVN_COEF - 0.5 * ( np.dot( normedPoint, np.matmul(INVCOV,normedPoint) ) )

Performance benchmark

Now, let's compare the performance of the two implementations getLogFunc() and getLogFunc_bad() in the above,

In [7]:
%timeit getLogFunc(np.double([0,0,0,0]))
5.71 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]:
%timeit getLogFunc_bad(np.double([0,0,0,0]))
24.3 µs ± 483 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

The good implementation is on average three to five times more efficient than the naive implementation of the objective function!

Fault tolerance

Now, let's compare the fault-tolerance of the two implementations by assigning large values to the elements of the input point array,

In [10]:
getLogFunc(point = NDIM*[10000])
Out[10]:
-200000003.67575413
In [11]:
getLogFunc_bad(point = NDIM*[10000])
C:\Users\shahmoradia\Anaconda3\lib\site-packages\ipykernel_launcher.py:10: RuntimeWarning: divide by zero encountered in log
  # Remove the CWD from sys.path while we load stuff.
Out[11]:
-inf

Now, this may seem like being too meticulous, but, a good fault-tolerant implementation of the objective function is absolutely essential in Monte Carlo simulations, and this example objective function here is no exception. The -inf value that getLogFunc_bad() yields, will certainly lead to a severe catastrophic crash of a Monte Carlo simulation (You can try it with the ParaMonte library's ParaDRAM sampler at your own risk!).

In [ ]: