Table of Contents¶

In this small notebook, I implement various Kullback-Leibler divergence functions, in Python, using different approaches: naive Python, and using Numba and Cython.

I also implement KL-UCB indexes, in the three approaches, and finally I present some basic benchmarks to compare the time and memory efficiency of the different approaches, for each function.

Requirements:

In [1]:

%load_ext watermark
%watermark -v -m -a "Lilian Besson (Naereen)" -p numpy,numba -g

Lilian Besson (Naereen) 

CPython 3.6.3
IPython 6.3.1

numpy 1.14.2
numba 0.37.0

compiler   : GCC 7.2.0
system     : Linux
release    : 4.13.0-38-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit
Git hash   : f56a7701503fe66afca4dcc7ed96827d462f773d

In [2]:

import numpy as np

KL divergences and KL-UCB indexes, in naive Python¶

I will copy and paste parts of this file from my SMPyBandits library.

In [3]:

eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

I will include docstrings and examples only for the naive implementation.

KL divergences¶

Bernoulli distributions¶

In [4]:

def klBern(x, y):
    r""" Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence

    .. math:: \mathrm{KL}(\mathcal{B}(x), \mathcal{B}(y)) = x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y})."""
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y))

In [5]:

klBern(0.5, 0.5)
klBern(0.1, 0.9)
klBern(0.9, 0.1)
klBern(0.4, 0.5)
klBern(0.01, 0.99)
klBern(0, 1)

Out[5]:

0.0

Out[5]:

1.7577796618689758

Out[5]:

1.7577796618689758

Out[5]:

0.020135513550688863

Out[5]:

4.503217453131898

Out[5]:

34.53957599234081

Binomial distributions¶

In [6]:

def klBin(x, y, n):
    r""" Kullback-Leibler divergence for Binomial distributions. https://math.stackexchange.com/questions/320399/kullback-leibner-divergence-of-binomial-distributions

    - It is simply the n times :func:`klBern` on x and y.

    .. math:: \mathrm{KL}(\mathrm{Bin}(x, n), \mathrm{Bin}(y, n)) = n \times \left(x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}) \right).

    .. warning:: The two distributions must have the same parameter n, and x, y are p, q in (0, 1).
    """
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return n * (x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y)))

In [7]:

klBin(0.5, 0.5, 10)
klBin(0.1, 0.9, 10)
klBin(0.9, 0.1, 10)
klBin(0.4, 0.5, 10)
klBin(0.01, 0.99, 10)
klBin(0, 1, 10)

Out[7]:

0.0

Out[7]:

17.57779661868976

Out[7]:

17.57779661868976

Out[7]:

0.20135513550688863

Out[7]:

45.03217453131897

Out[7]:

345.3957599234081

Poisson distributions¶

In [8]:

def klPoisson(x, y):
    r""" Kullback-Leibler divergence for Poison distributions. https://en.wikipedia.org/wiki/Poisson_distribution#Kullback.E2.80.93Leibler_divergence

    .. math:: \mathrm{KL}(\mathrm{Poisson}(x), \mathrm{Poisson}(y)) = y - x + x \times \log(\frac{x}{y}).
    """
    x = max(x, eps)
    y = max(y, eps)
    return y - x + x * np.log(x / y)

In [9]:

klPoisson(3, 3)
klPoisson(2, 1)
klPoisson(1, 2)
klPoisson(3, 6)
klPoisson(6, 8)
klPoisson(1, 0)
klPoisson(0, 0)

Out[9]:

0.0

Out[9]:

0.3862943611198906

Out[9]:

0.3068528194400547

Out[9]:

0.9205584583201643

Out[9]:

0.2739075652893146

Out[9]:

33.538776394910684

Out[9]:

0.0

Exponential distributions¶

In [10]:

def klExp(x, y):
    r""" Kullback-Leibler divergence for exponential distributions. https://en.wikipedia.org/wiki/Exponential_distribution#Kullback.E2.80.93Leibler_divergence

    .. math::

        \mathrm{KL}(\mathrm{Exp}(x), \mathrm{Exp}(y)) = \begin{cases}
        \frac{x}{y} - 1 - \log(\frac{x}{y}) & \text{if} x > 0, y > 0\\
        +\infty & \text{otherwise}
        \end{cases}
    """
    if x <= 0 or y <= 0:
        return float('+inf')
    else:
        x = max(x, eps)
        y = max(y, eps)
        return x / y - 1 - np.log(x / y)

In [11]:

klExp(3, 3)
klExp(3, 6)
klExp(1, 2)
klExp(2, 1)
klExp(4, 2)
klExp(6, 8)
klExp(-3, 2)
klExp(3, -2)
klExp(-3, -2)

Out[11]:

0.0

Out[11]:

0.1931471805599453

Out[11]:

0.1931471805599453

Out[11]:

0.3068528194400547

Out[11]:

0.3068528194400547

Out[11]:

0.0376820724517809

Out[11]:

inf

Out[11]:

inf

Out[11]:

inf

Gamma distributions¶

In [12]:

def klGamma(x, y, a=1):
    r""" Kullback-Leibler divergence for gamma distributions. https://en.wikipedia.org/wiki/Gamma_distribution#Kullback.E2.80.93Leibler_divergence

    - It is simply the a times :func:`klExp` on x and y.

    .. math::

        \mathrm{KL}(\Gamma(x, a), \Gamma(y, a)) = \begin{cases}
        a \times \left( \frac{x}{y} - 1 - \log(\frac{x}{y}) \right) & \text{if} x > 0, y > 0\\
        +\infty & \text{otherwise}
        \end{cases}

    .. warning:: The two distributions must have the same parameter a.
    """
    if x <= 0 or y <= 0:
        return float('+inf')
    else:
        x = max(x, eps)
        y = max(y, eps)
        return a * (x / y - 1 - np.log(x / y))

In [13]:

klGamma(3, 3)
klGamma(3, 6)
klGamma(1, 2)
klGamma(2, 1)
klGamma(4, 2)
klGamma(6, 8)
klGamma(-3, 2)
klGamma(3, -2)
klGamma(-3, -2)

Out[13]:

0.0

Out[13]:

0.1931471805599453

Out[13]:

0.1931471805599453

Out[13]:

0.3068528194400547

Out[13]:

0.3068528194400547

Out[13]:

0.0376820724517809

Out[13]:

inf

Out[13]:

inf

Out[13]:

inf

Negative binomial distributions¶

In [14]:

def klNegBin(x, y, r=1):
    r""" Kullback-Leibler divergence for negative binomial distributions. https://en.wikipedia.org/wiki/Negative_binomial_distribution

    .. math:: \mathrm{KL}(\mathrm{NegBin}(x, r), \mathrm{NegBin}(y, r)) = r \times \log((r + x) / (r + y)) - x \times \log(y \times (r + x) / (x \times (r + y))).

    .. warning:: The two distributions must have the same parameter r.
    """
    x = max(x, eps)
    y = max(y, eps)
    return r * np.log((r + x) / (r + y)) - x * np.log(y * (r + x) / (x * (r + y)))

In [15]:

klNegBin(0.5, 0.5)
klNegBin(0.1, 0.9)
klNegBin(0.9, 0.1)
klNegBin(0.4, 0.5)
klNegBin(0.01, 0.99)
klBern(0, 1)
klNegBin(0.5, 0.5, r=2)
klNegBin(0.1, 0.9, r=2)
klNegBin(0.1, 0.9, r=4)
klNegBin(0.9, 0.1, r=2)
klNegBin(0.4, 0.5, r=2)
klNegBin(0.01, 0.99, r=2)

Out[15]:

0.0

Out[15]:

-0.7116117934648849

Out[15]:

2.0321564902394043

Out[15]:

-0.13065314341785483

Out[15]:

-0.7173536633057466

Out[15]:

34.53957599234081

Out[15]:

0.0

Out[15]:

-0.8329919030334189

Out[15]:

-0.9148905602182661

Out[15]:

2.332552851091954

Out[15]:

-0.15457261175809217

Out[15]:

-0.8362571425112515

Gaussian distributions¶

In [16]:

def klGauss(x, y, sig2x=0.25, sig2y=None):
    r""" Kullback-Leibler divergence for Gaussian distributions of means ``x`` and ``y`` and variances ``sig2x`` and ``sig2y``, :math:`\nu_1 = \mathcal{N}(x, \sigma_x^2)` and :math:`\nu_2 = \mathcal{N}(y, \sigma_x^2)`:

    .. math:: \mathrm{KL}(\nu_1, \nu_2) = \frac{(x - y)^2}{2 \sigma_y^2} + \frac{1}{2}\left( \frac{\sigma_x^2}{\sigma_y^2} - 1 \log\left(\frac{\sigma_x^2}{\sigma_y^2}\right) \right).

    See https://en.wikipedia.org/wiki/Normal_distribution#Other_properties

    - By default, sig2y is assumed to be sig2x (same variance).
    """
    if sig2y is None or - eps < (sig2y - sig2x) < eps:
        return (x - y) ** 2 / (2. * sig2x)
    else:
        return (x - y) ** 2 / (2. * sig2y) + 0.5 * ((sig2x/sig2y)**2 - 1 - np.log(sig2x/sig2y))

In [17]:

klGauss(3, 3)
klGauss(3, 6)
klGauss(1, 2)
klGauss(2, 1)
klGauss(4, 2)
klGauss(6, 8)
klGauss(-3, 2)
klGauss(3, -2)
klGauss(-3, -2)
klGauss(3, 2)
klGauss(3, 3, sig2x=10)
klGauss(3, 6, sig2x=10)
klGauss(1, 2, sig2x=10)
klGauss(2, 1, sig2x=10)
klGauss(4, 2, sig2x=10)
klGauss(6, 8, sig2x=10)
klGauss(0, 0, sig2x=0.25, sig2y=0.5)
klGauss(0, 0, sig2x=0.25, sig2y=1.0)
klGauss(0, 0, sig2x=0.5, sig2y=0.25)
klGauss(0, 1, sig2x=0.25, sig2y=0.5)
klGauss(0, 1, sig2x=0.25, sig2y=1.0)
klGauss(0, 1, sig2x=0.5, sig2y=0.25)
klGauss(1, 0, sig2x=0.25, sig2y=0.5)
klGauss(1, 0, sig2x=0.25, sig2y=1.0)
klGauss(1, 0, sig2x=0.5, sig2y=0.25)

Out[17]:

0.0

Out[17]:

18.0

Out[17]:

2.0

Out[17]:

2.0

Out[17]:

8.0

Out[17]:

8.0

Out[17]:

50.0

Out[17]:

50.0

Out[17]:

2.0

Out[17]:

2.0

Out[17]:

0.0

Out[17]:

0.45

Out[17]:

0.05

Out[17]:

0.05

Out[17]:

0.2

Out[17]:

0.2

Out[17]:

-0.028426409720027357

Out[17]:

0.2243971805599453

Out[17]:

1.1534264097200273

Out[17]:

0.9715735902799727

Out[17]:

0.7243971805599453

Out[17]:

3.1534264097200273

Out[17]:

0.9715735902799727

Out[17]:

0.7243971805599453

Out[17]:

3.1534264097200273

Generic KL-UCB indexes, with a bisection search¶

In [18]:

def klucb(x, d, kl, upperbound, lowerbound=float('-inf'), precision=1e-6, max_iterations=50):
    """ The generic KL-UCB index computation.

    - x: value of the cum reward,
    - d: upper bound on the divergence,
    - kl: the KL divergence to be used (:func:`klBern`, :func:`klGauss`, etc),
    - upperbound, lowerbound=float('-inf'): the known bound of the values x,
    - precision=1e-6: the threshold from where to stop the research,
    - max_iterations: max number of iterations of the loop (safer to bound it to reduce time complexity).

    .. note:: It uses a **bisection search**, and one call to ``kl`` for each step of the bisection search.
    """
    value = max(x, lowerbound)
    u = upperbound
    _count_iteration = 0
    while _count_iteration < max_iterations and u - value > precision:
        _count_iteration += 1
        m = (value + u) / 2.
        if kl(x, m) > d:
            u = m
        else:
            value = m
    return (value + u) / 2.

For example, for klucbBern, the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

In [19]:

x, d = 0.9, 0.2
upperbound = 1
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)

Out[19]:

0.994140625

Out[19]:

0.9944824218750001

Out[19]:

0.994140625

Out[19]:

0.9944896697998048

Distribution-specific KL-UCB indexes¶

Gaussian¶

In [20]:

def klucbGauss(x, d, sig2x=0.25, precision=0.):
    """ KL-UCB index computation for Gaussian distributions.

    - Note that it does not require any search.

    .. warning:: it works only if the good variance constant is given.
    """
    return x + np.sqrt(2 * sig2x * d)

In [21]:

klucbGauss(0.1, 0.2)
klucbGauss(0.5, 0.2)
klucbGauss(0.9, 0.2)
klucbGauss(0.1, 0.4)
klucbGauss(0.1, 0.9)
klucbGauss(0.5, 0.4)
klucbGauss(0.5, 0.9)
klucbGauss(0.9, 0.4)
klucbGauss(0.9, 0.9)

Out[21]:

0.416227766016838

Out[21]:

0.816227766016838

Out[21]:

1.216227766016838

Out[21]:

0.547213595499958

Out[21]:

0.7708203932499369

Out[21]:

0.9472135954999579

Out[21]:

1.170820393249937

Out[21]:

1.347213595499958

Out[21]:

1.570820393249937

Bernoulli¶

In [22]:

def klucbBern(x, d, precision=1e-6):
    """ KL-UCB index computation for Bernoulli distributions, using :func:`klucb`."""
    upperbound = min(1., klucbGauss(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
    # upperbound = min(1., klucbPoisson(x, d))  # also safe, and better ?
    return klucb(x, d, klBern, upperbound, precision)

In [23]:

klucbBern(0.1, 0.2)
klucbBern(0.5, 0.2)
klucbBern(0.9, 0.2)
klucbBern(0.1, 0.4)
klucbBern(0.1, 0.9)
klucbBern(0.5, 0.4)
klucbBern(0.5, 0.9)
klucbBern(0.9, 0.4)
klucbBern(0.9, 0.9)

Out[23]:

0.37839145109809247

Out[23]:

0.7870889692292777

Out[23]:

0.9944896697998048

Out[23]:

0.5194755673450786

Out[23]:

0.7347148310932183

Out[23]:

0.871035844022684

Out[23]:

0.9568095207214355

Out[23]:

0.9992855072021485

Out[23]:

0.9999950408935546

Poisson¶

In [24]:

def klucbPoisson(x, d, precision=1e-6):
    """ KL-UCB index computation for Poisson distributions, using :func:`klucb`."""
    upperbound = x + d + np.sqrt(d * d + 2 * x * d)  # looks safe, to check: left (Gaussian) tail of Poisson dev
    return klucb(x, d, klPoisson, upperbound, precision)

In [25]:

klucbPoisson(0.1, 0.2)
klucbPoisson(0.5, 0.2)
klucbPoisson(0.9, 0.2)
klucbPoisson(0.1, 0.4)
klucbPoisson(0.1, 0.9)
klucbPoisson(0.5, 0.4)
klucbPoisson(0.5, 0.9)
klucbPoisson(0.9, 0.4)
klucbPoisson(0.9, 0.9)

Out[25]:

0.45052392780119604

Out[25]:

1.0893765430263218

Out[25]:

1.6401128559741487

Out[25]:

0.6936844019642616

Out[25]:

1.2527967047658155

Out[25]:

1.4229339603816749

Out[25]:

2.122985165630671

Out[25]:

2.033691887156203

Out[25]:

2.8315738094979777

Exponential¶

In [26]:

def klucbExp(x, d, precision=1e-6):
    """ KL-UCB index computation for exponential distributions, using :func:`klucb`."""
    if d < 0.77:  # XXX where does this value come from?
        upperbound = x / (1 + 2. / 3 * d - np.sqrt(4. / 9 * d * d + 2 * d))
        # safe, klexp(x,y) >= e^2/(2*(1-2e/3)) if x=y(1-e)
    else:
        upperbound = x * np.exp(d + 1)
    if d > 1.61:  # XXX where does this value come from?
        lowerbound = x * np.exp(d)
    else:
        lowerbound = x / (1 + d - np.sqrt(d * d + 2 * d))
    return klucb(x, d, klGamma, upperbound, lowerbound, precision)

In [27]:

klucbExp(0.1, 0.2)
klucbExp(0.5, 0.2)
klucbExp(0.9, 0.2)
klucbExp(0.1, 0.4)
klucbExp(0.1, 0.9)
klucbExp(0.5, 0.4)
klucbExp(0.5, 0.9)
klucbExp(0.9, 0.4)
klucbExp(0.9, 0.9)

Out[27]:

0.20274118449172676

Out[27]:

1.013706285168157

Out[27]:

1.8246716397412546

Out[27]:

0.2857928251730546

Out[27]:

0.5590884945251575

Out[27]:

1.428962647183463

Out[27]:

2.7954420946912126

Out[27]:

2.572132498767508

Out[27]:

5.031795430303065

Others¶

We could do the same for more distributions, but that's enough.

With Numba¶

It will be exactly the same code as above, except that the numba.jit decorator will be used for each functions, to let numba try to speed up the code!

In [28]:

from numba import jit

As much as possible, one should call @jit(nopython=True) to be sure that numba does not fall back silently to naive Python code. With nopython=True, any call to the generated function will fail if the compilation could not succeed.

KL divergences¶

Bernoulli distributions¶

In [29]:

@jit(nopython=True)
def klBern_numba(x, y):
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y))

Binomial distributions¶

In [30]:

@jit(nopython=True)
def klBin_numba(x, y, n):
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return n * (x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y)))

Poisson distributions¶

In [31]:

@jit(nopython=True)
def klPoisson_numba(x, y):
    x = max(x, eps)
    y = max(y, eps)
    return y - x + x * np.log(x / y)

Exponential distributions¶

In [32]:

@jit(nopython=True)
def klExp_numba(x, y):
    if x <= 0 or y <= 0:
        return inf
    else:
        x = max(x, eps)
        y = max(y, eps)
        return x / y - 1 - np.log(x / y)

Gamma distributions¶

In [33]:

@jit(nopython=True)
def klGamma_numba(x, y, a=1):
    if x <= 0 or y <= 0:
        return inf
    else:
        x = max(x, eps)
        y = max(y, eps)
        return a * (x / y - 1 - np.log(x / y))

Negative binomial distributions¶

In [34]:

@jit(nopython=True)
def klNegBin_numba(x, y, r=1):
    x = max(x, eps)
    y = max(y, eps)
    return r * np.log((r + x) / (r + y)) - x * np.log(y * (r + x) / (x * (r + y)))

Gaussian distributions¶

In [35]:

@jit(nopython=True)
def klGauss_numba(x, y, sig2x=0.25, sig2y=0.25):
    if - eps < (sig2y - sig2x) and (sig2y - sig2x) < eps:
        return (x - y) ** 2 / (2. * sig2x)
    else:
        return (x - y) ** 2 / (2. * sig2y) + 0.5 * ((sig2x/sig2y)**2 - 1 - np.log(sig2x/sig2y))

Generic KL-UCB indexes, with a bisection search¶

In [36]:

@jit
def klucb_numba(x, d, kl, upperbound,
                lowerbound=float('-inf'), precision=1e-6, max_iterations=50):
    value = max(x, lowerbound)
    u = upperbound
    _count_iteration = 0
    while _count_iteration < max_iterations and u - value > precision:
        _count_iteration += 1
        m = (value + u) / 2.
        if kl(x, m) > d:
            u = m
        else:
            value = m
    return (value + u) / 2.

For example, for klucbBern, the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

In [37]:

x, d = 0.9, 0.2
upperbound = 1
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)

Out[37]:

0.994140625

Out[37]:

0.9944824218750001

Out[37]:

0.994140625

Out[37]:

0.9944896697998048

Distribution-specific KL-UCB indexes¶

Gaussian¶

In [38]:

@jit(nopython=True)
def klucbGauss_numba(x, d, sig2x=0.25, precision=0.):
    return x + np.sqrt(2 * sig2x * d)

Bernoulli¶

Here, the nopython=True fails as numba has a hard time typing linked function calls.

In [39]:

@jit
def klucbBern_numba(x, d, precision=1e-6):
    upperbound = min(1., klucbGauss_numba(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
    # upperbound = min(1., klucbPoisson(x, d))  # also safe, and better ?
    return klucb_numba(x, d, klBern_numba, upperbound, precision)

Poisson¶

In [40]:

@jit
def klucbPoisson_numba(x, d, precision=1e-6):
    upperbound = x + d + np.sqrt(d * d + 2 * x * d)  # looks safe, to check: left (Gaussian) tail of Poisson dev
    return klucb_numba(x, d, klPoisson_numba, upperbound, precision)

Exponential¶

In [41]:

@jit
def klucbExp_numba(x, d, precision=1e-6):
    if d < 0.77:  # XXX where does this value come from?
        upperbound = x / (1 + 2. / 3 * d - np.sqrt(4. / 9 * d * d + 2 * d))
        # safe, klexp(x,y) >= e^2/(2*(1-2e/3)) if x=y(1-e)
    else:
        upperbound = x * np.exp(d + 1)
    if d > 1.61:  # XXX where does this value come from?
        lowerbound = x * np.exp(d)
    else:
        lowerbound = x / (1 + d - np.sqrt(d * d + 2 * d))
    return klucb_numba(x, d, klGamma_numba, upperbound, lowerbound, precision)

With Cython¶

It will be almost exactly the same code, by using the cython magic to have cells written in Cython.

In [42]:

%load_ext cython

A cell can now be written in Cython. For instance, we can define a simple example function in Python, and then write a Cython version, simply by declaring variables and tagging their types, like this:

In [43]:

def some_loop(n: int) -> int:
    s = 0
    for i in range(0, n, 2):
        s += i
    return s

In [44]:

%%cython
def some_loop_cython(int n) -> int:
    cdef int s = 0
    cdef int i = 0
    for i in range(0, n, 2):
        s += i
    return s

In [45]:

%timeit np.random.randint(1000)
%timeit some_loop(np.random.randint(1000))
%timeit some_loop_cython(np.random.randint(1000))

1.72 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
13.1 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.14 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Here we observe a large speed-up. But how large? $6$ times or $50$ times?

It's really important to include the time taken by the Pseudo-Random Number Generator:

Wrong computation of the speed-up gives about $6$ times faster:

In [46]:

14.6 / 2.21

Out[46]:

6.606334841628959

But if we remove the time taken by the PRNG (which takes the same time for both the naive Python and the Cython function), we get a larger speed-up, closer to reality, about $50$ times and not just $6$ times faster!

In [47]:

(14.6 - 1.95) / (2.21 - 1.95)

Out[47]:

48.65384615384615

KL divergences¶

Bernoulli distributions¶

In [48]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klBern_cython(float x, float y) -> float:
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return x * log(x / y) + (1 - x) * log((1 - x) / (1 - y))

Binomial distributions¶

In [49]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klBin_cython(float x, float y, int n) -> float:
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return n * (x * log(x / y) + (1 - x) * log((1 - x) / (1 - y)))

Poisson distributions¶

In [50]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klPoisson_cython(float x, float y) -> float:
    x = max(x, eps)
    y = max(y, eps)
    return y - x + x * log(x / y)

Exponential distributions¶

In [51]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klExp_cython(float x, float y) -> float:
    if x <= 0 or y <= 0:
        return float('+inf')
    else:
        x = max(x, eps)
        y = max(y, eps)
        return x / y - 1 - log(x / y)

Gamma distributions¶

In [52]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klGamma_cython(float x, float y, float a=1) -> float:
    if x <= 0 or y <= 0:
        return float('+inf')
    else:
        x = max(x, eps)
        y = max(y, eps)
        return a * (x / y - 1 - log(x / y))

Negative binomial distributions¶

In [53]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klNegBin_cython(float x, float y, float r=1) -> float:
    x = max(x, eps)
    y = max(y, eps)
    return r * log((r + x) / (r + y)) - x * log(y * (r + x) / (x * (r + y)))

Gaussian distributions¶

In [54]:

%%cython
from libc.math cimport log
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

def klGauss_cython(float x, float y, float sig2x=0.25, float sig2y=0.25) -> float:
    if - eps < (sig2y - sig2x) < eps:
        return (x - y) ** 2 / (2. * sig2x)
    else:
        return (x - y) ** 2 / (2. * sig2y) + 0.5 * ((sig2x/sig2y)**2 - 1 - log(sig2x/sig2y))

Generic KL-UCB indexes, with a bisection search¶

For these, they need previously defined functions, which have to be rewritten from inside the cython cell to be accessible from Cython. To minimize repetitions, I use only one cell to define all functions.

In [55]:

%%cython
from libc.math cimport sqrt, log, exp
eps = 1e-15  #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]


def klucbGauss_cython(float x, float d, float sig2x=0.25, float precision=0.) -> float:
    return x + sqrt(2 * sig2x * d)

cdef float klucbGauss_cython_x(float x, float d, float sig2x=0.25, float precision=0.):
    return x + sqrt(2 * sig2x * d)


def klucb_cython(float x, float d, kl, float upperbound,
                 float lowerbound=float('-inf'),
                 float precision=1e-6, int max_iterations=50) -> float:
    cdef float value = max(x, lowerbound)
    cdef float u = upperbound
    cdef int _count_iteration = 0
    cdef float m = 0
    while _count_iteration < max_iterations and u - value > precision:
        _count_iteration += 1
        m = (value + u) / 2.
        if kl(x, m) > d:
            u = m
        else:
            value = m
    return (value + u) / 2.


cdef float klBern_cython_x(float x, float y):
    x = min(max(x, eps), 1 - eps)
    y = min(max(y, eps), 1 - eps)
    return x * log(x / y) + (1 - x) * log((1 - x) / (1 - y))

def klucbBern_cython(float x, float d, float precision=1e-6) -> float:
    cdef float upperbound = min(1., klucbGauss_cython_x(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
    # upperbound = min(1., klucbPoisson(x, d))  # also safe, and better ?
    return klucb_cython(x, d, klBern_cython_x, upperbound, precision)


cdef float klPoisson_cython_x(float x, float y):
    x = max(x, eps)
    y = max(y, eps)
    return y - x + x * log(x / y)

def klucbPoisson_cython(float x, float d, float precision=1e-6) -> float:
    cdef float upperbound = x + d + sqrt(d * d + 2 * x * d)  # looks safe, to check: left (Gaussian) tail of Poisson dev
    return klucb_cython(x, d, klPoisson_cython_x, upperbound, precision)


cdef float klGamma_cython_x(float x, float y):
    if x <= 0 or y <= 0:
        return float('+inf')
    else:
        x = max(x, eps)
        y = max(y, eps)
        return x / y - 1 - log(x / y)

def klucbExp_cython(float x, float d, float precision=1e-6) -> float:
    cdef float upperbound = 1
    cdef float lowerbound = 0
    if d < 0.77:  # XXX where does this value come from?
        upperbound = x / (1 + 2. / 3 * d - sqrt(4. / 9 * d * d + 2 * d))
        # safe, klexp(x,y) >= e^2/(2*(1-2e/3)) if x=y(1-e)
    else:
        upperbound = x * exp(d + 1)
    if d > 1.61:  # XXX where does this value come from?
        lowerbound = x * exp(d)
    else:
        lowerbound = x / (1 + d - sqrt(d * d + 2 * d))
    return klucb_cython(x, d, klGamma_cython_x, upperbound, lowerbound, precision)

For example, for klucbBern_cython, the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

In [56]:

x, d = 0.9, 0.2
upperbound = 1
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)

Out[56]:

0.994140625

Out[56]:

0.9944823980331421

Out[56]:

0.994140625

Out[56]:

0.9944896697998047

With the C API for Python¶

It is more tedious, and won't be included here, but Python can easily be extended using C. It is the best way to obtain close-to-optimal performance for some parts of your code, and I will let you read the introduction to the official documentation if you are curious.

For my SMPyBandits, I reused some code from the py/maBandits project, and the authors implemented some of the previously defined KL-divergences and KL-UCB indexes in pure Python as well as in C optimized. I copied the compiled library in the current directory, and it can be imported:

In [57]:

%%bash
ls -larth *kullback*
[ -f kullback.py ] && mv -vf kullback.py kullback.py.old

-rw-r--r-- 1 lilian lilian 54K oct.  28 10:52 kullback.cpython-36m-x86_64-linux-gnu.so
-rw-r--r-- 1 lilian lilian 19K janv.  9 17:31 kullback.py
'kullback.py' -> 'kullback.py.old'

In [58]:

!ls -larth kullback*.so

-rw-r--r-- 1 lilian lilian 54K oct.  28 10:52  kullback.cpython-36m-x86_64-linux-gnu.so

In [59]:

import kullback

In [60]:

help(kullback.klBern)

Help on built-in function klBern in module kullback:

klBern(...)
    klBern(x, y): Calculate the binary Kullback-Leibler divergence.

In [72]:

[ s for s in dir(kullback) if not s.startswith('_') ]

Out[72]:

['klBern',
 'klBin',
 'klExp',
 'klGamma',
 'klGauss',
 'klPoisson',
 'klucbBern',
 'klucbExp',
 'klucbGamma',
 'klucbGauss',
 'klucbPoisson',
 'maxEV']

In [73]:

klBern_c = kullback.klBern
klBin_c = kullback.klBin
klExp_c = kullback.klExp
klGamma_c = kullback.klGamma
klGauss_c = kullback.klGauss
klPoisson_c = kullback.klPoisson
klucbBern_c = kullback.klucbBern
klucbExp_c = kullback.klucbExp
klucbGamma_c = kullback.klucbGamma
klucbGauss_c = kullback.klucbGauss
klucbPoisson_c = kullback.klucbPoisson

If you want to reproduce this notebook, download the kullback_py3.c and follow the build instructions.

Tests and benchmarks¶

For each of the functions defined in three approaches above, I will do some numerical tests to compare their speed − and memory − efficiency. Simple.

The benchmark will be to test the computation time on random entries. It includes a constant time: creating random values! So I also compare the time to simply generate the values.

In [61]:

r = np.random.random
rn = lambda: np.random.randint(1000)

In [85]:

%timeit (r(), r())
%timeit (r(), r(), rn())

576 ns ± 8.53 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
2.35 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The time to generate random numbers like this is small, but not zero!
Generating a uniform integer, in particular, takes some time (more than 1 µs is not something that can be ignored!).

$\implies$ we will remove this $700$ ns or $2.5$ µs overhead when computing speed-up ratio between naive Python and numb or Cython versions.

But we also need to test that the three versions of each function gives the same results (up-to approximation errors less than $10^{-6}$ (at least)).

In [63]:

def test_fs(fs, inputs, tolerance=1e-5, nb_tests=100):
    for _ in range(nb_tests):
        args = inputs()
        ref_f = fs[0]  # Python version
        output = ref_f(*args)
        for other_f in fs[1:]:
            other_output = other_f(*args)
            if abs(output) > 1:
                rel_diff = (output - other_output) / output
            else:
                rel_diff = (output - other_output)
            assert abs(rel_diff) <= tolerance, "Error: function {} gave {} and function {} gave {} on inputs {}, and the two outputs are too different.".format(ref_f, output, other_f, other_output, args)

WARNING in the following, I use a very manual approach: I copied the time of each '%timeit' example, to compare speed-up ratios. So when I rerun the cells, the times might vary (a little bit), and I cannot keep an up-to-date versions of the computations of each ratio, so bear with me the (tiny) incoherences.

KL divergences¶

Bernoulli¶

In [75]:

test_fs([klBern, klBern_numba, klBern_cython, klBern_c], lambda: (r(), r()))

In [65]:

%timeit klBern(r(), r())

6.28 µs ± 919 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [66]:

%timeit klBern_numba(r(), r())

1 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [67]:

%timeit klBern_cython(r(), r())

882 ns ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [74]:

%timeit klBern_c(r(), r())

811 ns ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $12$ times faster for Numba and Cython, and $25$ times faster for the C version.

In [87]:

(6280 - 576) / (1000 - 576)  # for Python vs numba
(6280 - 576) / (882 - 576)   # for Python vs Cython
(6280 - 576) / (811 - 576)   # for Python vs C

Out[87]:

13.452830188679245

Out[87]:

18.640522875816995

Out[87]:

24.272340425531915

Binomial¶

In [76]:

test_fs([klBin, klBin_numba, klBin_cython, klBin_c], lambda: (r(), r(), rn()))

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-76-cad9d3d2157b> in <module>()
----> 1 test_fs([klBin, klBin_numba, klBin_cython, klBin_c], lambda: (r(), r(), rn()))

<ipython-input-63-12acbf032a73> in test_fs(fs, inputs, tolerance, nb_tests)
     10             else:
     11                 rel_diff = (output - other_output)
---> 12             assert abs(rel_diff) <= tolerance, "Error: function {} gave {} and function {} gave {} on inputs {}, and the two outputs are too different.".format(ref_f, output, other_f, other_output, args)

AssertionError: Error: function <function klBin at 0x7f8ab53acae8> gave 0.0011060494622968124 and function <built-in function klBin_cython> gave 0.0011183457989155888 on inputs (0.14985671245165777, 0.14884809478960037, 276), and the two outputs are too different.

Too much numerical difference? Let's try again with a larger tolerance:

In [77]:

test_fs([klBin, klBin_numba, klBin_cython, klBin_c], lambda: (r(), r(), rn()), tolerance=1e-3)

In [78]:

%timeit klBin(r(), r(), rn())

7.05 µs ± 620 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [79]:

%timeit klBin_numba(r(), r(), rn())

3.07 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [80]:

%timeit klBin_cython(r(), r(), rn())

3.31 µs ± 258 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [81]:

%timeit klBin_c(r(), r(), rn())

2.98 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This is a speed-up ratio of about $5$ times faster for both Numba and Cython. Not so great, but still something!

In [89]:

(7005 - 2350) / (3070 - 2350)  # for Python vs numba
(7005 - 2350) / (3331 - 2350)  # for Python vs Cython
(7005 - 2350) / (2980 - 2350)  # for Python vs C

Out[89]:

6.465277777777778

Out[89]:

4.745158002038736

Out[89]:

7.388888888888889

Poisson¶

In [90]:

test_fs([klPoisson, klPoisson_numba, klPoisson_cython, klPoisson_c], lambda: (r(), r()))

In [91]:

%timeit klPoisson(r(), r())

2.35 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [92]:

%timeit klPoisson_numba(r(), r())

935 ns ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [93]:

%timeit klPoisson_cython(r(), r())

859 ns ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [94]:

%timeit klPoisson_c(r(), r())

811 ns ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $7.5$ times faster for Numba, and about $7$ times for Cython and C.

In [95]:

(2350 - 576) / (935 - 576)  # for Python vs numba
(2350 - 576) / (859 - 576)  # for Python vs Cython
(2350 - 576) / (811 - 576)  # for Python vs C

Out[95]:

4.9415041782729805

Out[95]:

6.268551236749117

Out[95]:

7.548936170212766

Exponential¶

In [96]:

test_fs([klExp, klExp_numba, klExp_cython, klExp_c], lambda: (r(), r()))

In [97]:

%timeit klExp(r(), r())

2.21 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [98]:

%timeit klExp_numba(r(), r())

1.07 µs ± 211 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [99]:

%timeit klExp_cython(r(), r())

842 ns ± 32.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [100]:

%timeit klExp_c(r(), r())

869 ns ± 56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $3$ times faster for Numba and $6$ times faster for Cython. Cython starts to win the race!

In [101]:

(2210 - 576) / (1070 - 576)  # for Python vs numba
(2210 - 576) / (842 - 576)   # for Python vs Cython
(2210 - 576) / (869 - 576)   # for Python vs C

Out[101]:

3.3076923076923075

Out[101]:

6.142857142857143

Out[101]:

5.57679180887372

Gamma¶

In [105]:

klGamma_c = lambda x, y: kullback.klGamma(x, y, 1)

In [106]:

test_fs([klGamma, klGamma_numba, klGamma_cython, klGamma_c], lambda: (r(), r()))

In [107]:

%timeit klGamma(r(), r())

2.7 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [108]:

%timeit klGamma_numba(r(), r())

1.07 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [109]:

%timeit klGamma_cython(r(), r())

889 ns ± 87.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [110]:

%timeit klGamma_c(r(), r())

997 ns ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $6$ times faster for Numba, and $6$ or $5$ times faster for Cython and C. Note that the C version probably looses a little bit due to this lambda , y: ... trick.

In [111]:

(2700 - 576) / (1070 - 576)  # for Python vs numba
(2700 - 576) / (889 - 576)   # for Python vs Cython
(2700 - 576) / (997 - 576)   # for Python vs C

Out[111]:

4.299595141700405

Out[111]:

6.785942492012779

Out[111]:

5.045130641330166

Negative binomial¶

In [112]:

test_fs([klNegBin, klNegBin_numba, klNegBin_cython], lambda: (r(), r()))

In [113]:

%timeit klNegBin(r(), r())

3.89 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [114]:

%timeit klNegBin_numba(r(), r())

1.16 µs ± 76.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [115]:

%timeit klNegBin_cython(r(), r())

901 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $5$ times faster for Numba and $10$ times faster for Cython.

In [116]:

(3890 - 576) / (1160 - 576)  # for Python vs numba
(3890 - 576) / (901 - 576)   # for Python vs Cython

Out[116]:

5.674657534246576

Out[116]:

10.196923076923078

Gaussian¶

In [120]:

klGauss_c = lambda x, y: kullback.klGauss(x, y, 0.25)

In [121]:

test_fs([klGauss, klGauss_numba, klGauss_cython, klGauss_c], lambda: (r(), r()))

In [122]:

%timeit klGauss(r(), r())

852 ns ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [123]:

%timeit klGauss_numba(r(), r())

1.07 µs ± 54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [124]:

%timeit klGauss_cython(r(), r())

745 ns ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [125]:

%timeit klGauss_c(r(), r())

911 ns ± 56.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $45$ times faster for Cython, but Numba completely here! Why? No idea! C is also slower than the Python version here, due to this lambda x,y: ... trick.

In [126]:

(852 - 576) / (1070 - 576)  # for Python vs numba
(852 - 576) / (745 - 576)   # for Python vs Cython
(852 - 576) / (911 - 576)   # for Python vs C

Out[126]:

0.5587044534412956

Out[126]:

1.6331360946745561

Out[126]:

0.8238805970149253

KL-UCB indexes¶

Gaussian¶

In [128]:

klucbGauss_c = lambda x, y: kullback.klucbGauss(x, y, 0.25)

In [129]:

test_fs([klucbGauss, klucbGauss_numba, klucbGauss_cython, klucbGauss_c], lambda: (r(), r()))

In [130]:

%timeit klucbGauss(r(), r())

1.96 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [131]:

%timeit klucbGauss_numba(r(), r())

31.3 µs ± 950 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [132]:

%timeit klucbGauss_cython(r(), r())

676 ns ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [133]:

%timeit klucbGauss_c(r(), r())

939 ns ± 23.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This is a speed-up ratio of about $14$ times faster for Cython and $4$ times for C, and one more failure case for Numba. The C version looses again due to the lambda x,y trick.

In [135]:

(1960 - 576) / (31300 - 576)  # for Python vs numba
(1960 - 576) / (676 - 576)    # for Python vs Cython
(1960 - 576) / (939 - 576)    # for Python vs C

Out[135]:

0.04504621794037235

Out[135]:

13.84

Out[135]:

3.81267217630854

Bernoulli¶

In [139]:

klucbBern_c = lambda x, y: kullback.klucbBern(x, y, 1e-6)

In [140]:

test_fs([klucbBern, klucbBern_numba, klucbBern_cython, klucbBern_c], lambda: (r(), r()))

In [141]:

%timeit klucbBern(r(), r())

91.9 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [142]:

%timeit klucbBern_numba(r(), r())

170 µs ± 8.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [143]:

%timeit klucbBern_cython(r(), r())

6.93 µs ± 815 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [144]:

%timeit klucbBern_c(r(), r())

3.14 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This is a speed-up ratio of about $15$ times faster for Cython, and one more failure case for Numba. The speed-up for the C version is CRAZY here, and it shows that optimizing loops is the most important!

In [146]:

(91900 - 576) / (170000 - 576)  # for Python vs numba
(91900 - 576) / (6930 - 576)    # for Python vs Cython
(91900 - 576) / abs(314 - 576)  # for Python vs C

Out[146]:

0.5390263480970818

Out[146]:

14.372678627636136

Out[146]:

348.5648854961832

Poisson¶

In [148]:

klucbPoisson_c = lambda x, y: kullback.klucbPoisson(x, y, 1e-6)

In [149]:

test_fs([klucbPoisson, klucbPoisson_numba, klucbPoisson_cython, klucbPoisson_c], lambda: (r(), r()))

In [150]:

%timeit klucbPoisson(r(), r())

72.6 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [151]:

%timeit klucbPoisson_numba(r(), r())

167 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [152]:

%timeit klucbPoisson_cython(r(), r())

5.33 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [153]:

%timeit klucbPoisson_c(r(), r())

2.18 µs ± 37.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This is a speed-up ratio of about $15$ times faster for Cython, and one more failure case for Numba. It's again a strong victory for the C version, with a speed up of about $45$ !

In [154]:

(72600 - 576) / (167000 - 576)  # for Python vs numba
(72600 - 576) / (5330 - 576)    # for Python vs Cython
(72600 - 576) / (2180 - 576)    # for Python vs Cython

Out[154]:

0.43277411911743496

Out[154]:

15.150189314261674

Out[154]:

44.90274314214464

Exponential¶

In [155]:

klucbExp_c = lambda x, y: kullback.klucbExp(x, y, 1e-6)

In [156]:

test_fs([klucbExp, klucbExp_numba, klucbExp_cython, klucbExp_c], lambda: (r(), r()))

In [157]:

%timeit klucbExp(r(), r())

78.7 µs ± 898 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [158]:

%timeit klucbExp_numba(r(), r())

156 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [159]:

%timeit klucbExp_cython(r(), r())

4.41 µs ± 401 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [160]:

%timeit klucbExp_c(r(), r())

2.04 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

This is a speed-up ratio of about $17$ times faster for Cython, and one more failure case for Numba. Strong victory for the C version, a speed-up of $50$ is impressive!

In [161]:

(78700 - 576) / (156000 - 576)  # for Python vs numba
(78700 - 576) / (4410 - 576)    # for Python vs Cython
(78700 - 576) / (2040 - 576)    # for Python vs Cython

Out[161]:

0.5026508132592135

Out[161]:

20.37663015127804

Out[161]:

53.36338797814208

Clean up¶

In [162]:

%%bash
[ -f kullback.py.old ] && mv -vf kullback.py.old kullback.py

'kullback.py.old' -> 'kullback.py'

Conclusion¶

As expected, the Numba, Cython and C versions are way faster than the naive Python versions, on very simple functions,
The simpler the function, the closer the speed-up is between Numba and Cython or C,
Cython and C always give the best improvement,
On less simple functions, Numba can fail to produce nopython code, and on some examples the nopython code can be slower than naive Python (like, crazily slower). No idea why, and the point was precisely not to try too much optimizing this use of Numba.
Cython gives speed-up factors typically between $100$ and $12$ times faster than naive Python.
C alsways gives the best speed-up, with a speed-up between $150$ and $50$ times faster than naive Python, and usually $10$ times faster than Cython!

Take away messages¶

The take away messages are the following:

if your code makes a heavy use of a few small and not-too-complicated functions, it is probably worth using numba.jit to speed them up,
but be careful, and do some basic benchmark on each "possibly optimized" function, to check that using Numba actually speeds it up instead of slowing it down!
if Numba is not enough to speed up your code, try to write a Cython version of the bottleneck functions.
if your bottleneck is still too slow, try to write a C extension, but it requires a good knowledge of both the internals of the C language, as well as a (basic) knowledge of the C-API of CPython.

Using Cython for real ?¶

My advice for using Cython are the following:

First try in a notebook, using this %%cython magic is very easy!
Then if you are happy about your implementation, save it to a .pyx file, and use pyximport from your Python code to automatically compile and import it. It works perfectly fine, believe me!

Using C for real ?¶

It's not too hard, but it's certainly not as easy as Cython. My approach from now will to not even consider writing C code, as Cython offers a very good speedup when carefully used. And it's much easier to just write import pyximport before importing your Cython-accelerated module, in comparison to writing a setupy.py and requiring a compilation step for a C-optimized module!

That's it for today, folks! See this page for other notebooks I wrote recently.