In this small notebook, I implement various Kullback-Leibler divergence functions, in Python, using different approaches: naive Python, and using Numba and Cython.
I also implement KL-UCB indexes, in the three approaches, and finally I present some basic benchmarks to compare the time and memory efficiency of the different approaches, for each function.
Requirements:
%load_ext watermark
%watermark -v -m -a "Lilian Besson (Naereen)" -p numpy,numba -g
Lilian Besson (Naereen) CPython 3.6.3 IPython 6.3.1 numpy 1.14.2 numba 0.37.0 compiler : GCC 7.2.0 system : Linux release : 4.13.0-38-generic machine : x86_64 processor : x86_64 CPU cores : 4 interpreter: 64bit Git hash : f56a7701503fe66afca4dcc7ed96827d462f773d
import numpy as np
I will copy and paste parts of this file from my SMPyBandits library.
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
I will include docstrings and examples only for the naive implementation.
def klBern(x, y):
r""" Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence
.. math:: \mathrm{KL}(\mathcal{B}(x), \mathcal{B}(y)) = x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y})."""
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y))
klBern(0.5, 0.5)
klBern(0.1, 0.9)
klBern(0.9, 0.1)
klBern(0.4, 0.5)
klBern(0.01, 0.99)
klBern(0, 1)
0.0
1.7577796618689758
1.7577796618689758
0.020135513550688863
4.503217453131898
34.53957599234081
def klBin(x, y, n):
r""" Kullback-Leibler divergence for Binomial distributions. https://math.stackexchange.com/questions/320399/kullback-leibner-divergence-of-binomial-distributions
- It is simply the n times :func:`klBern` on x and y.
.. math:: \mathrm{KL}(\mathrm{Bin}(x, n), \mathrm{Bin}(y, n)) = n \times \left(x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}) \right).
.. warning:: The two distributions must have the same parameter n, and x, y are p, q in (0, 1).
"""
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return n * (x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y)))
klBin(0.5, 0.5, 10)
klBin(0.1, 0.9, 10)
klBin(0.9, 0.1, 10)
klBin(0.4, 0.5, 10)
klBin(0.01, 0.99, 10)
klBin(0, 1, 10)
0.0
17.57779661868976
17.57779661868976
0.20135513550688863
45.03217453131897
345.3957599234081
def klPoisson(x, y):
r""" Kullback-Leibler divergence for Poison distributions. https://en.wikipedia.org/wiki/Poisson_distribution#Kullback.E2.80.93Leibler_divergence
.. math:: \mathrm{KL}(\mathrm{Poisson}(x), \mathrm{Poisson}(y)) = y - x + x \times \log(\frac{x}{y}).
"""
x = max(x, eps)
y = max(y, eps)
return y - x + x * np.log(x / y)
klPoisson(3, 3)
klPoisson(2, 1)
klPoisson(1, 2)
klPoisson(3, 6)
klPoisson(6, 8)
klPoisson(1, 0)
klPoisson(0, 0)
0.0
0.3862943611198906
0.3068528194400547
0.9205584583201643
0.2739075652893146
33.538776394910684
0.0
def klExp(x, y):
r""" Kullback-Leibler divergence for exponential distributions. https://en.wikipedia.org/wiki/Exponential_distribution#Kullback.E2.80.93Leibler_divergence
.. math::
\mathrm{KL}(\mathrm{Exp}(x), \mathrm{Exp}(y)) = \begin{cases}
\frac{x}{y} - 1 - \log(\frac{x}{y}) & \text{if} x > 0, y > 0\\
+\infty & \text{otherwise}
\end{cases}
"""
if x <= 0 or y <= 0:
return float('+inf')
else:
x = max(x, eps)
y = max(y, eps)
return x / y - 1 - np.log(x / y)
klExp(3, 3)
klExp(3, 6)
klExp(1, 2)
klExp(2, 1)
klExp(4, 2)
klExp(6, 8)
klExp(-3, 2)
klExp(3, -2)
klExp(-3, -2)
0.0
0.1931471805599453
0.1931471805599453
0.3068528194400547
0.3068528194400547
0.0376820724517809
inf
inf
inf
def klGamma(x, y, a=1):
r""" Kullback-Leibler divergence for gamma distributions. https://en.wikipedia.org/wiki/Gamma_distribution#Kullback.E2.80.93Leibler_divergence
- It is simply the a times :func:`klExp` on x and y.
.. math::
\mathrm{KL}(\Gamma(x, a), \Gamma(y, a)) = \begin{cases}
a \times \left( \frac{x}{y} - 1 - \log(\frac{x}{y}) \right) & \text{if} x > 0, y > 0\\
+\infty & \text{otherwise}
\end{cases}
.. warning:: The two distributions must have the same parameter a.
"""
if x <= 0 or y <= 0:
return float('+inf')
else:
x = max(x, eps)
y = max(y, eps)
return a * (x / y - 1 - np.log(x / y))
klGamma(3, 3)
klGamma(3, 6)
klGamma(1, 2)
klGamma(2, 1)
klGamma(4, 2)
klGamma(6, 8)
klGamma(-3, 2)
klGamma(3, -2)
klGamma(-3, -2)
0.0
0.1931471805599453
0.1931471805599453
0.3068528194400547
0.3068528194400547
0.0376820724517809
inf
inf
inf
def klNegBin(x, y, r=1):
r""" Kullback-Leibler divergence for negative binomial distributions. https://en.wikipedia.org/wiki/Negative_binomial_distribution
.. math:: \mathrm{KL}(\mathrm{NegBin}(x, r), \mathrm{NegBin}(y, r)) = r \times \log((r + x) / (r + y)) - x \times \log(y \times (r + x) / (x \times (r + y))).
.. warning:: The two distributions must have the same parameter r.
"""
x = max(x, eps)
y = max(y, eps)
return r * np.log((r + x) / (r + y)) - x * np.log(y * (r + x) / (x * (r + y)))
klNegBin(0.5, 0.5)
klNegBin(0.1, 0.9)
klNegBin(0.9, 0.1)
klNegBin(0.4, 0.5)
klNegBin(0.01, 0.99)
klBern(0, 1)
klNegBin(0.5, 0.5, r=2)
klNegBin(0.1, 0.9, r=2)
klNegBin(0.1, 0.9, r=4)
klNegBin(0.9, 0.1, r=2)
klNegBin(0.4, 0.5, r=2)
klNegBin(0.01, 0.99, r=2)
0.0
-0.7116117934648849
2.0321564902394043
-0.13065314341785483
-0.7173536633057466
34.53957599234081
0.0
-0.8329919030334189
-0.9148905602182661
2.332552851091954
-0.15457261175809217
-0.8362571425112515
def klGauss(x, y, sig2x=0.25, sig2y=None):
r""" Kullback-Leibler divergence for Gaussian distributions of means ``x`` and ``y`` and variances ``sig2x`` and ``sig2y``, :math:`\nu_1 = \mathcal{N}(x, \sigma_x^2)` and :math:`\nu_2 = \mathcal{N}(y, \sigma_x^2)`:
.. math:: \mathrm{KL}(\nu_1, \nu_2) = \frac{(x - y)^2}{2 \sigma_y^2} + \frac{1}{2}\left( \frac{\sigma_x^2}{\sigma_y^2} - 1 \log\left(\frac{\sigma_x^2}{\sigma_y^2}\right) \right).
See https://en.wikipedia.org/wiki/Normal_distribution#Other_properties
- By default, sig2y is assumed to be sig2x (same variance).
"""
if sig2y is None or - eps < (sig2y - sig2x) < eps:
return (x - y) ** 2 / (2. * sig2x)
else:
return (x - y) ** 2 / (2. * sig2y) + 0.5 * ((sig2x/sig2y)**2 - 1 - np.log(sig2x/sig2y))
klGauss(3, 3)
klGauss(3, 6)
klGauss(1, 2)
klGauss(2, 1)
klGauss(4, 2)
klGauss(6, 8)
klGauss(-3, 2)
klGauss(3, -2)
klGauss(-3, -2)
klGauss(3, 2)
klGauss(3, 3, sig2x=10)
klGauss(3, 6, sig2x=10)
klGauss(1, 2, sig2x=10)
klGauss(2, 1, sig2x=10)
klGauss(4, 2, sig2x=10)
klGauss(6, 8, sig2x=10)
klGauss(0, 0, sig2x=0.25, sig2y=0.5)
klGauss(0, 0, sig2x=0.25, sig2y=1.0)
klGauss(0, 0, sig2x=0.5, sig2y=0.25)
klGauss(0, 1, sig2x=0.25, sig2y=0.5)
klGauss(0, 1, sig2x=0.25, sig2y=1.0)
klGauss(0, 1, sig2x=0.5, sig2y=0.25)
klGauss(1, 0, sig2x=0.25, sig2y=0.5)
klGauss(1, 0, sig2x=0.25, sig2y=1.0)
klGauss(1, 0, sig2x=0.5, sig2y=0.25)
0.0
18.0
2.0
2.0
8.0
8.0
50.0
50.0
2.0
2.0
0.0
0.45
0.05
0.05
0.2
0.2
-0.028426409720027357
0.2243971805599453
1.1534264097200273
0.9715735902799727
0.7243971805599453
3.1534264097200273
0.9715735902799727
0.7243971805599453
3.1534264097200273
def klucb(x, d, kl, upperbound, lowerbound=float('-inf'), precision=1e-6, max_iterations=50):
""" The generic KL-UCB index computation.
- x: value of the cum reward,
- d: upper bound on the divergence,
- kl: the KL divergence to be used (:func:`klBern`, :func:`klGauss`, etc),
- upperbound, lowerbound=float('-inf'): the known bound of the values x,
- precision=1e-6: the threshold from where to stop the research,
- max_iterations: max number of iterations of the loop (safer to bound it to reduce time complexity).
.. note:: It uses a **bisection search**, and one call to ``kl`` for each step of the bisection search.
"""
value = max(x, lowerbound)
u = upperbound
_count_iteration = 0
while _count_iteration < max_iterations and u - value > precision:
_count_iteration += 1
m = (value + u) / 2.
if kl(x, m) > d:
u = m
else:
value = m
return (value + u) / 2.
For example, for klucbBern
, the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:
x, d = 0.9, 0.2
upperbound = 1
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)
0.994140625
0.9944824218750001
0.994140625
0.9944896697998048
def klucbGauss(x, d, sig2x=0.25, precision=0.):
""" KL-UCB index computation for Gaussian distributions.
- Note that it does not require any search.
.. warning:: it works only if the good variance constant is given.
"""
return x + np.sqrt(2 * sig2x * d)
klucbGauss(0.1, 0.2)
klucbGauss(0.5, 0.2)
klucbGauss(0.9, 0.2)
klucbGauss(0.1, 0.4)
klucbGauss(0.1, 0.9)
klucbGauss(0.5, 0.4)
klucbGauss(0.5, 0.9)
klucbGauss(0.9, 0.4)
klucbGauss(0.9, 0.9)
0.416227766016838
0.816227766016838
1.216227766016838
0.547213595499958
0.7708203932499369
0.9472135954999579
1.170820393249937
1.347213595499958
1.570820393249937
def klucbBern(x, d, precision=1e-6):
""" KL-UCB index computation for Bernoulli distributions, using :func:`klucb`."""
upperbound = min(1., klucbGauss(x, d, sig2x=0.25)) # variance 1/4 for [0,1] bounded distributions
# upperbound = min(1., klucbPoisson(x, d)) # also safe, and better ?
return klucb(x, d, klBern, upperbound, precision)
klucbBern(0.1, 0.2)
klucbBern(0.5, 0.2)
klucbBern(0.9, 0.2)
klucbBern(0.1, 0.4)
klucbBern(0.1, 0.9)
klucbBern(0.5, 0.4)
klucbBern(0.5, 0.9)
klucbBern(0.9, 0.4)
klucbBern(0.9, 0.9)
0.37839145109809247
0.7870889692292777
0.9944896697998048
0.5194755673450786
0.7347148310932183
0.871035844022684
0.9568095207214355
0.9992855072021485
0.9999950408935546
def klucbPoisson(x, d, precision=1e-6):
""" KL-UCB index computation for Poisson distributions, using :func:`klucb`."""
upperbound = x + d + np.sqrt(d * d + 2 * x * d) # looks safe, to check: left (Gaussian) tail of Poisson dev
return klucb(x, d, klPoisson, upperbound, precision)
klucbPoisson(0.1, 0.2)
klucbPoisson(0.5, 0.2)
klucbPoisson(0.9, 0.2)
klucbPoisson(0.1, 0.4)
klucbPoisson(0.1, 0.9)
klucbPoisson(0.5, 0.4)
klucbPoisson(0.5, 0.9)
klucbPoisson(0.9, 0.4)
klucbPoisson(0.9, 0.9)
0.45052392780119604
1.0893765430263218
1.6401128559741487
0.6936844019642616
1.2527967047658155
1.4229339603816749
2.122985165630671
2.033691887156203
2.8315738094979777
def klucbExp(x, d, precision=1e-6):
""" KL-UCB index computation for exponential distributions, using :func:`klucb`."""
if d < 0.77: # XXX where does this value come from?
upperbound = x / (1 + 2. / 3 * d - np.sqrt(4. / 9 * d * d + 2 * d))
# safe, klexp(x,y) >= e^2/(2*(1-2e/3)) if x=y(1-e)
else:
upperbound = x * np.exp(d + 1)
if d > 1.61: # XXX where does this value come from?
lowerbound = x * np.exp(d)
else:
lowerbound = x / (1 + d - np.sqrt(d * d + 2 * d))
return klucb(x, d, klGamma, upperbound, lowerbound, precision)
klucbExp(0.1, 0.2)
klucbExp(0.5, 0.2)
klucbExp(0.9, 0.2)
klucbExp(0.1, 0.4)
klucbExp(0.1, 0.9)
klucbExp(0.5, 0.4)
klucbExp(0.5, 0.9)
klucbExp(0.9, 0.4)
klucbExp(0.9, 0.9)
0.20274118449172676
1.013706285168157
1.8246716397412546
0.2857928251730546
0.5590884945251575
1.428962647183463
2.7954420946912126
2.572132498767508
5.031795430303065
We could do the same for more distributions, but that's enough.
from numba import jit
As much as possible, one should call @jit(nopython=True)
to be sure that numba does not fall back silently to naive Python code. With nopython=True
, any call to the generated function will fail if the compilation could not succeed.
@jit(nopython=True)
def klBern_numba(x, y):
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y))
@jit(nopython=True)
def klBin_numba(x, y, n):
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return n * (x * np.log(x / y) + (1 - x) * np.log((1 - x) / (1 - y)))
@jit(nopython=True)
def klPoisson_numba(x, y):
x = max(x, eps)
y = max(y, eps)
return y - x + x * np.log(x / y)
@jit(nopython=True)
def klExp_numba(x, y):
if x <= 0 or y <= 0:
return inf
else:
x = max(x, eps)
y = max(y, eps)
return x / y - 1 - np.log(x / y)
@jit(nopython=True)
def klGamma_numba(x, y, a=1):
if x <= 0 or y <= 0:
return inf
else:
x = max(x, eps)
y = max(y, eps)
return a * (x / y - 1 - np.log(x / y))
@jit(nopython=True)
def klNegBin_numba(x, y, r=1):
x = max(x, eps)
y = max(y, eps)
return r * np.log((r + x) / (r + y)) - x * np.log(y * (r + x) / (x * (r + y)))
@jit(nopython=True)
def klGauss_numba(x, y, sig2x=0.25, sig2y=0.25):
if - eps < (sig2y - sig2x) and (sig2y - sig2x) < eps:
return (x - y) ** 2 / (2. * sig2x)
else:
return (x - y) ** 2 / (2. * sig2y) + 0.5 * ((sig2x/sig2y)**2 - 1 - np.log(sig2x/sig2y))
@jit
def klucb_numba(x, d, kl, upperbound,
lowerbound=float('-inf'), precision=1e-6, max_iterations=50):
value = max(x, lowerbound)
u = upperbound
_count_iteration = 0
while _count_iteration < max_iterations and u - value > precision:
_count_iteration += 1
m = (value + u) / 2.
if kl(x, m) > d:
u = m
else:
value = m
return (value + u) / 2.
For example, for klucbBern
, the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:
x, d = 0.9, 0.2
upperbound = 1
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
klucb_numba(x, d, klBern_numba, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)
0.994140625
0.9944824218750001
0.994140625
0.9944896697998048
@jit(nopython=True)
def klucbGauss_numba(x, d, sig2x=0.25, precision=0.):
return x + np.sqrt(2 * sig2x * d)
Here, the nopython=True
fails as numba has a hard time typing linked function calls.
@jit
def klucbBern_numba(x, d, precision=1e-6):
upperbound = min(1., klucbGauss_numba(x, d, sig2x=0.25)) # variance 1/4 for [0,1] bounded distributions
# upperbound = min(1., klucbPoisson(x, d)) # also safe, and better ?
return klucb_numba(x, d, klBern_numba, upperbound, precision)
@jit
def klucbPoisson_numba(x, d, precision=1e-6):
upperbound = x + d + np.sqrt(d * d + 2 * x * d) # looks safe, to check: left (Gaussian) tail of Poisson dev
return klucb_numba(x, d, klPoisson_numba, upperbound, precision)
@jit
def klucbExp_numba(x, d, precision=1e-6):
if d < 0.77: # XXX where does this value come from?
upperbound = x / (1 + 2. / 3 * d - np.sqrt(4. / 9 * d * d + 2 * d))
# safe, klexp(x,y) >= e^2/(2*(1-2e/3)) if x=y(1-e)
else:
upperbound = x * np.exp(d + 1)
if d > 1.61: # XXX where does this value come from?
lowerbound = x * np.exp(d)
else:
lowerbound = x / (1 + d - np.sqrt(d * d + 2 * d))
return klucb_numba(x, d, klGamma_numba, upperbound, lowerbound, precision)
%load_ext cython
A cell can now be written in Cython. For instance, we can define a simple example function in Python, and then write a Cython version, simply by declaring variables and tagging their types, like this:
def some_loop(n: int) -> int:
s = 0
for i in range(0, n, 2):
s += i
return s
%%cython
def some_loop_cython(int n) -> int:
cdef int s = 0
cdef int i = 0
for i in range(0, n, 2):
s += i
return s
%timeit np.random.randint(1000)
%timeit some_loop(np.random.randint(1000))
%timeit some_loop_cython(np.random.randint(1000))
1.72 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 13.1 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 2.14 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Here we observe a large speed-up. But how large? $6$ times or $50$ times?
It's really important to include the time taken by the Pseudo-Random Number Generator:
14.6 / 2.21
6.606334841628959
(14.6 - 1.95) / (2.21 - 1.95)
48.65384615384615
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klBern_cython(float x, float y) -> float:
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return x * log(x / y) + (1 - x) * log((1 - x) / (1 - y))
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klBin_cython(float x, float y, int n) -> float:
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return n * (x * log(x / y) + (1 - x) * log((1 - x) / (1 - y)))
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klPoisson_cython(float x, float y) -> float:
x = max(x, eps)
y = max(y, eps)
return y - x + x * log(x / y)
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klExp_cython(float x, float y) -> float:
if x <= 0 or y <= 0:
return float('+inf')
else:
x = max(x, eps)
y = max(y, eps)
return x / y - 1 - log(x / y)
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klGamma_cython(float x, float y, float a=1) -> float:
if x <= 0 or y <= 0:
return float('+inf')
else:
x = max(x, eps)
y = max(y, eps)
return a * (x / y - 1 - log(x / y))
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klNegBin_cython(float x, float y, float r=1) -> float:
x = max(x, eps)
y = max(y, eps)
return r * log((r + x) / (r + y)) - x * log(y * (r + x) / (x * (r + y)))
%%cython
from libc.math cimport log
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klGauss_cython(float x, float y, float sig2x=0.25, float sig2y=0.25) -> float:
if - eps < (sig2y - sig2x) < eps:
return (x - y) ** 2 / (2. * sig2x)
else:
return (x - y) ** 2 / (2. * sig2y) + 0.5 * ((sig2x/sig2y)**2 - 1 - log(sig2x/sig2y))
For these, they need previously defined functions, which have to be rewritten from inside the cython
cell to be accessible from Cython.
To minimize repetitions, I use only one cell to define all functions.
%%cython
from libc.math cimport sqrt, log, exp
eps = 1e-15 #: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]
def klucbGauss_cython(float x, float d, float sig2x=0.25, float precision=0.) -> float:
return x + sqrt(2 * sig2x * d)
cdef float klucbGauss_cython_x(float x, float d, float sig2x=0.25, float precision=0.):
return x + sqrt(2 * sig2x * d)
def klucb_cython(float x, float d, kl, float upperbound,
float lowerbound=float('-inf'),
float precision=1e-6, int max_iterations=50) -> float:
cdef float value = max(x, lowerbound)
cdef float u = upperbound
cdef int _count_iteration = 0
cdef float m = 0
while _count_iteration < max_iterations and u - value > precision:
_count_iteration += 1
m = (value + u) / 2.
if kl(x, m) > d:
u = m
else:
value = m
return (value + u) / 2.
cdef float klBern_cython_x(float x, float y):
x = min(max(x, eps), 1 - eps)
y = min(max(y, eps), 1 - eps)
return x * log(x / y) + (1 - x) * log((1 - x) / (1 - y))
def klucbBern_cython(float x, float d, float precision=1e-6) -> float:
cdef float upperbound = min(1., klucbGauss_cython_x(x, d, sig2x=0.25)) # variance 1/4 for [0,1] bounded distributions
# upperbound = min(1., klucbPoisson(x, d)) # also safe, and better ?
return klucb_cython(x, d, klBern_cython_x, upperbound, precision)
cdef float klPoisson_cython_x(float x, float y):
x = max(x, eps)
y = max(y, eps)
return y - x + x * log(x / y)
def klucbPoisson_cython(float x, float d, float precision=1e-6) -> float:
cdef float upperbound = x + d + sqrt(d * d + 2 * x * d) # looks safe, to check: left (Gaussian) tail of Poisson dev
return klucb_cython(x, d, klPoisson_cython_x, upperbound, precision)
cdef float klGamma_cython_x(float x, float y):
if x <= 0 or y <= 0:
return float('+inf')
else:
x = max(x, eps)
y = max(y, eps)
return x / y - 1 - log(x / y)
def klucbExp_cython(float x, float d, float precision=1e-6) -> float:
cdef float upperbound = 1
cdef float lowerbound = 0
if d < 0.77: # XXX where does this value come from?
upperbound = x / (1 + 2. / 3 * d - sqrt(4. / 9 * d * d + 2 * d))
# safe, klexp(x,y) >= e^2/(2*(1-2e/3)) if x=y(1-e)
else:
upperbound = x * exp(d + 1)
if d > 1.61: # XXX where does this value come from?
lowerbound = x * exp(d)
else:
lowerbound = x / (1 + d - sqrt(d * d + 2 * d))
return klucb_cython(x, d, klGamma_cython_x, upperbound, lowerbound, precision)
For example, for klucbBern_cython
, the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:
x, d = 0.9, 0.2
upperbound = 1
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
klucb_cython(x, d, klBern_cython, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)
0.994140625
0.9944823980331421
0.994140625
0.9944896697998047
It is more tedious, and won't be included here, but Python can easily be extended using C. It is the best way to obtain close-to-optimal performance for some parts of your code, and I will let you read the introduction to the official documentation if you are curious.
For my SMPyBandits, I reused some code from the py/maBandits project, and the authors implemented some of the previously defined KL-divergences and KL-UCB indexes in pure Python as well as in C optimized. I copied the compiled library in the current directory, and it can be imported:
%%bash
ls -larth *kullback*
[ -f kullback.py ] && mv -vf kullback.py kullback.py.old
-rw-r--r-- 1 lilian lilian 54K oct. 28 10:52 kullback.cpython-36m-x86_64-linux-gnu.so -rw-r--r-- 1 lilian lilian 19K janv. 9 17:31 kullback.py 'kullback.py' -> 'kullback.py.old'
!ls -larth kullback*.so
-rw-r--r-- 1 lilian lilian 54K oct. 28 10:52 kullback.cpython-36m-x86_64-linux-gnu.so
import kullback
help(kullback.klBern)
Help on built-in function klBern in module kullback: klBern(...) klBern(x, y): Calculate the binary Kullback-Leibler divergence.
[ s for s in dir(kullback) if not s.startswith('_') ]
['klBern', 'klBin', 'klExp', 'klGamma', 'klGauss', 'klPoisson', 'klucbBern', 'klucbExp', 'klucbGamma', 'klucbGauss', 'klucbPoisson', 'maxEV']
klBern_c = kullback.klBern
klBin_c = kullback.klBin
klExp_c = kullback.klExp
klGamma_c = kullback.klGamma
klGauss_c = kullback.klGauss
klPoisson_c = kullback.klPoisson
klucbBern_c = kullback.klucbBern
klucbExp_c = kullback.klucbExp
klucbGamma_c = kullback.klucbGamma
klucbGauss_c = kullback.klucbGauss
klucbPoisson_c = kullback.klucbPoisson
If you want to reproduce this notebook, download the kullback_py3.c
and follow the build instructions.
For each of the functions defined in three approaches above, I will do some numerical tests to compare their speed − and memory − efficiency. Simple.
The benchmark will be to test the computation time on random entries. It includes a constant time: creating random values! So I also compare the time to simply generate the values.
r = np.random.random
rn = lambda: np.random.randint(1000)
%timeit (r(), r())
%timeit (r(), r(), rn())
576 ns ± 8.53 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) 2.35 µs ± 19.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
$\implies$ we will remove this $700$ ns or $2.5$ µs overhead when computing speed-up ratio between naive Python and numb or Cython versions.
But we also need to test that the three versions of each function gives the same results (up-to approximation errors less than $10^{-6}$ (at least)).
def test_fs(fs, inputs, tolerance=1e-5, nb_tests=100):
for _ in range(nb_tests):
args = inputs()
ref_f = fs[0] # Python version
output = ref_f(*args)
for other_f in fs[1:]:
other_output = other_f(*args)
if abs(output) > 1:
rel_diff = (output - other_output) / output
else:
rel_diff = (output - other_output)
assert abs(rel_diff) <= tolerance, "Error: function {} gave {} and function {} gave {} on inputs {}, and the two outputs are too different.".format(ref_f, output, other_f, other_output, args)
WARNING in the following, I use a very manual approach: I copied the time of each '%timeit' example, to compare speed-up ratios. So when I rerun the cells, the times might vary (a little bit), and I cannot keep an up-to-date versions of the computations of each ratio, so bear with me the (tiny) incoherences.
test_fs([klBern, klBern_numba, klBern_cython, klBern_c], lambda: (r(), r()))
%timeit klBern(r(), r())
6.28 µs ± 919 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klBern_numba(r(), r())
1 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klBern_cython(r(), r())
882 ns ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klBern_c(r(), r())
811 ns ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $12$ times faster for Numba and Cython, and $25$ times faster for the C version.
(6280 - 576) / (1000 - 576) # for Python vs numba
(6280 - 576) / (882 - 576) # for Python vs Cython
(6280 - 576) / (811 - 576) # for Python vs C
13.452830188679245
18.640522875816995
24.272340425531915
test_fs([klBin, klBin_numba, klBin_cython, klBin_c], lambda: (r(), r(), rn()))
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-76-cad9d3d2157b> in <module>() ----> 1 test_fs([klBin, klBin_numba, klBin_cython, klBin_c], lambda: (r(), r(), rn())) <ipython-input-63-12acbf032a73> in test_fs(fs, inputs, tolerance, nb_tests) 10 else: 11 rel_diff = (output - other_output) ---> 12 assert abs(rel_diff) <= tolerance, "Error: function {} gave {} and function {} gave {} on inputs {}, and the two outputs are too different.".format(ref_f, output, other_f, other_output, args) AssertionError: Error: function <function klBin at 0x7f8ab53acae8> gave 0.0011060494622968124 and function <built-in function klBin_cython> gave 0.0011183457989155888 on inputs (0.14985671245165777, 0.14884809478960037, 276), and the two outputs are too different.
Too much numerical difference? Let's try again with a larger tolerance:
test_fs([klBin, klBin_numba, klBin_cython, klBin_c], lambda: (r(), r(), rn()), tolerance=1e-3)
%timeit klBin(r(), r(), rn())
7.05 µs ± 620 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klBin_numba(r(), r(), rn())
3.07 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klBin_cython(r(), r(), rn())
3.31 µs ± 258 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klBin_c(r(), r(), rn())
2.98 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is a speed-up ratio of about $5$ times faster for both Numba and Cython. Not so great, but still something!
(7005 - 2350) / (3070 - 2350) # for Python vs numba
(7005 - 2350) / (3331 - 2350) # for Python vs Cython
(7005 - 2350) / (2980 - 2350) # for Python vs C
6.465277777777778
4.745158002038736
7.388888888888889
test_fs([klPoisson, klPoisson_numba, klPoisson_cython, klPoisson_c], lambda: (r(), r()))
%timeit klPoisson(r(), r())
2.35 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klPoisson_numba(r(), r())
935 ns ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klPoisson_cython(r(), r())
859 ns ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klPoisson_c(r(), r())
811 ns ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $7.5$ times faster for Numba, and about $7$ times for Cython and C.
(2350 - 576) / (935 - 576) # for Python vs numba
(2350 - 576) / (859 - 576) # for Python vs Cython
(2350 - 576) / (811 - 576) # for Python vs C
4.9415041782729805
6.268551236749117
7.548936170212766
test_fs([klExp, klExp_numba, klExp_cython, klExp_c], lambda: (r(), r()))
%timeit klExp(r(), r())
2.21 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klExp_numba(r(), r())
1.07 µs ± 211 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klExp_cython(r(), r())
842 ns ± 32.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klExp_c(r(), r())
869 ns ± 56 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $3$ times faster for Numba and $6$ times faster for Cython. Cython starts to win the race!
(2210 - 576) / (1070 - 576) # for Python vs numba
(2210 - 576) / (842 - 576) # for Python vs Cython
(2210 - 576) / (869 - 576) # for Python vs C
3.3076923076923075
6.142857142857143
5.57679180887372
klGamma_c = lambda x, y: kullback.klGamma(x, y, 1)
test_fs([klGamma, klGamma_numba, klGamma_cython, klGamma_c], lambda: (r(), r()))
%timeit klGamma(r(), r())
2.7 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klGamma_numba(r(), r())
1.07 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klGamma_cython(r(), r())
889 ns ± 87.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klGamma_c(r(), r())
997 ns ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $6$ times faster for Numba, and $6$ or $5$ times faster for Cython and C.
Note that the C version probably looses a little bit due to this lambda , y: ...
trick.
(2700 - 576) / (1070 - 576) # for Python vs numba
(2700 - 576) / (889 - 576) # for Python vs Cython
(2700 - 576) / (997 - 576) # for Python vs C
4.299595141700405
6.785942492012779
5.045130641330166
test_fs([klNegBin, klNegBin_numba, klNegBin_cython], lambda: (r(), r()))
%timeit klNegBin(r(), r())
3.89 µs ± 249 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klNegBin_numba(r(), r())
1.16 µs ± 76.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klNegBin_cython(r(), r())
901 ns ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $5$ times faster for Numba and $10$ times faster for Cython.
(3890 - 576) / (1160 - 576) # for Python vs numba
(3890 - 576) / (901 - 576) # for Python vs Cython
5.674657534246576
10.196923076923078
klGauss_c = lambda x, y: kullback.klGauss(x, y, 0.25)
test_fs([klGauss, klGauss_numba, klGauss_cython, klGauss_c], lambda: (r(), r()))
%timeit klGauss(r(), r())
852 ns ± 40.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klGauss_numba(r(), r())
1.07 µs ± 54 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klGauss_cython(r(), r())
745 ns ± 40.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klGauss_c(r(), r())
911 ns ± 56.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $45$ times faster for Cython, but Numba completely here!
Why? No idea!
C is also slower than the Python version here, due to this lambda x,y: ...
trick.
(852 - 576) / (1070 - 576) # for Python vs numba
(852 - 576) / (745 - 576) # for Python vs Cython
(852 - 576) / (911 - 576) # for Python vs C
0.5587044534412956
1.6331360946745561
0.8238805970149253
klucbGauss_c = lambda x, y: kullback.klucbGauss(x, y, 0.25)
test_fs([klucbGauss, klucbGauss_numba, klucbGauss_cython, klucbGauss_c], lambda: (r(), r()))
%timeit klucbGauss(r(), r())
1.96 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klucbGauss_numba(r(), r())
31.3 µs ± 950 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit klucbGauss_cython(r(), r())
676 ns ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit klucbGauss_c(r(), r())
939 ns ± 23.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
This is a speed-up ratio of about $14$ times faster for Cython and $4$ times for C, and one more failure case for Numba.
The C version looses again due to the lambda x,y
trick.
(1960 - 576) / (31300 - 576) # for Python vs numba
(1960 - 576) / (676 - 576) # for Python vs Cython
(1960 - 576) / (939 - 576) # for Python vs C
0.04504621794037235
13.84
3.81267217630854
klucbBern_c = lambda x, y: kullback.klucbBern(x, y, 1e-6)
test_fs([klucbBern, klucbBern_numba, klucbBern_cython, klucbBern_c], lambda: (r(), r()))
%timeit klucbBern(r(), r())
91.9 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit klucbBern_numba(r(), r())
170 µs ± 8.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit klucbBern_cython(r(), r())
6.93 µs ± 815 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klucbBern_c(r(), r())
3.14 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is a speed-up ratio of about $15$ times faster for Cython, and one more failure case for Numba. The speed-up for the C version is CRAZY here, and it shows that optimizing loops is the most important!
(91900 - 576) / (170000 - 576) # for Python vs numba
(91900 - 576) / (6930 - 576) # for Python vs Cython
(91900 - 576) / abs(314 - 576) # for Python vs C
0.5390263480970818
14.372678627636136
348.5648854961832
klucbPoisson_c = lambda x, y: kullback.klucbPoisson(x, y, 1e-6)
test_fs([klucbPoisson, klucbPoisson_numba, klucbPoisson_cython, klucbPoisson_c], lambda: (r(), r()))
%timeit klucbPoisson(r(), r())
72.6 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit klucbPoisson_numba(r(), r())
167 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit klucbPoisson_cython(r(), r())
5.33 µs ± 382 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klucbPoisson_c(r(), r())
2.18 µs ± 37.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is a speed-up ratio of about $15$ times faster for Cython, and one more failure case for Numba. It's again a strong victory for the C version, with a speed up of about $45$ !
(72600 - 576) / (167000 - 576) # for Python vs numba
(72600 - 576) / (5330 - 576) # for Python vs Cython
(72600 - 576) / (2180 - 576) # for Python vs Cython
0.43277411911743496
15.150189314261674
44.90274314214464
klucbExp_c = lambda x, y: kullback.klucbExp(x, y, 1e-6)
test_fs([klucbExp, klucbExp_numba, klucbExp_cython, klucbExp_c], lambda: (r(), r()))
%timeit klucbExp(r(), r())
78.7 µs ± 898 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit klucbExp_numba(r(), r())
156 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit klucbExp_cython(r(), r())
4.41 µs ± 401 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit klucbExp_c(r(), r())
2.04 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
This is a speed-up ratio of about $17$ times faster for Cython, and one more failure case for Numba. Strong victory for the C version, a speed-up of $50$ is impressive!
(78700 - 576) / (156000 - 576) # for Python vs numba
(78700 - 576) / (4410 - 576) # for Python vs Cython
(78700 - 576) / (2040 - 576) # for Python vs Cython
0.5026508132592135
20.37663015127804
53.36338797814208
%%bash
[ -f kullback.py.old ] && mv -vf kullback.py.old kullback.py
'kullback.py.old' -> 'kullback.py'
nopython
code, and on some examples the nopython
code can be slower than naive Python (like, crazily slower). No idea why, and the point was precisely not to try too much optimizing this use of Numba.The take away messages are the following:
numba.jit
to speed them up,My advice for using Cython are the following:
%%cython
magic is very easy!.pyx
file, and use pyximport
from your Python code to automatically compile and import it. It works perfectly fine, believe me!It's not too hard, but it's certainly not as easy as Cython.
My approach from now will to not even consider writing C code, as Cython offers a very good speedup when carefully used. And it's much easier to just write import pyximport
before importing your Cython-accelerated module, in comparison to writing a setupy.py
and requiring a compilation step for a C-optimized module!
That's it for today, folks! See this page for other notebooks I wrote recently.