Simple question: Let's say you have an array (nicely laid out in memory), but you have to loop over it. Which is faster, np.log
or math.log
?
import ctypes
import math
import numba
import numpy as np
import scipy.integrate
arr = np.random.rand(10_000)
%%timeit
for item in arr:
np.log(item)
%%timeit
for item in arr:
math.log(item)
Of course, if we use array processing:
%%timeit
np.log(arr)
Aside¶
NumPy arrays were designed to work at larger scales. Under about 10 elements, you may even get higher performance from Python lists! Normally, the nice syntax is still worth it, and you can usually find a way to scale out to array processing by adding another dimension, but something to keep in mind.
Let's cast the above problem in a slightly different use case, one you are much more likely to run into: functions that take a function. Let's say we have a processing function that takes an array and a function:
def array_map(array, func):
out_array = np.empty_like(array)
# Note the comma
(size,) = array.shape
for i in range(size):
out_array[i] = func(array[i])
return out_array
Quick check to see if the above effect is still true:
%%timeit
array_map(arr, np.log)
%%timeit
array_map(arr, math.log)
Now, let's imagine that array_map actually contained a compiled loop. And our function was a compiled function. What will happen inside the loop, though, is:
Compiled -> Python -> Compiled
Which kills our performance. What we'd like to do is skip the Python middle man in this case.
Hey, we have Numba, we don't have to imagine:
@numba.njit
def numba_array_map(array, func):
out_array = np.empty_like(array)
# Note the comma
(size,) = array.shape
for i in range(size):
out_array[i] = func(array[i])
return out_array
Numba helpfully will not allow us to pass in a Python function. But if we pass in a numba function:
@numba.njit
def numba_log(v):
return math.log(v)
%%timeit
numba_array_map(arr, numba_log)
Success! We can pass in a jit function into a jit function and they call each other without going back to Python! But wouldn't it be nice to be able to do this without a jit master function?
Let's look at Scipy. It is a large library with lots of routines that can take functions and iterate with them. They have implemented a standard way to interact with compiled function pointers through what they call a LowLevelCallable interface; if you have a function pointer, you can completely skip the Python middle man!
Note that the LowLevelCallable is just a standard interface they proposed inside Scipy to handle callables from three different sources (PyCapsule, ctypes, and cffi), and to bundle in the idea of user data (absent in ctypes and cffi). The idea can be used in other places, usually with just the ctypes interface.
Let's try the following integral:
$$ \int _{-\infty} ^ {\infty} e^{-a x ^2} dx = \sqrt{\frac{\pi}{a}} $$# @numba.vectorize([numba.double(numba.double, numba.double)])
def integrand(x, a):
return np.exp(-a * x**2)
@np.vectorize
def gauss_py(a):
y, abserr = scipy.integrate.quad(integrand, -np.inf, np.inf, (a,))
return y
Note:¶
Since you may not have seen it before,
np.vectorize
is a Python version ofnumba.vectorize
; you don't get a performance benefit from it, but it simplifies calling this on an array.
%%time
a = np.linspace(0.1, 10, 10_000)
py_result = gauss_py(a)
print(py_result)
print(np.sqrt(np.pi / a))
Results are not bad, but the performance is not great, for just 10K points. Even if we add numba, not much changes. This is because we are calling integrand through Python in a loop inside the quad routine.
Let's check the LowLevelCallable signature:
# scipy.integrate.quad?
Here's the key part:
func : {function, scipy.LowLevelCallable}
A Python function or method to integrate. If `func` takes many
arguments, it is integrated along the axis corresponding to the
first argument.
If the user desires improved integration performance, then `f` may
be a `scipy.LowLevelCallable` with one of the signatures::
double func(double x)
double func(double x, void *user_data)
double func(int n, double *xx)
double func(int n, double *xx, void *user_data)
The ``user_data`` is the data contained in the `scipy.LowLevelCallable`.
In the call forms with ``xx``, ``n`` is the length of the ``xx``
array which contains ``xx[0] == x`` and the rest of the items are
numbers contained in the ``args`` argument of quad.
In addition, certain ctypes call signatures are supported for
backward compatibility, but those should not be used in new code.
Okay, double(double) sounds easy - but we need to pass in one more bit of information, the value of a
. Let's try making that first using args (nicer), and then using user data (ugly):
This is the signature we expect:
double func(int n, double *xx)
@numba.cfunc(numba.double(numba.int32, numba.types.CPointer(numba.double)))
def integrand(n, x_ptr):
x, a = numba.carray(x_ptr, (n,), np.double) # Fails if n != 2, but that's good
return np.exp(-a * x**2)
Now the numba function provides a ctypes interface through the .ctypes
property, so we can use that in LowLevelCallable:
c = scipy.LowLevelCallable(integrand.ctypes)
@np.vectorize
def gauss_py(a):
y, abserr = scipy.integrate.quad(c, -np.inf, np.inf, (a,))
return y
%%time
a = np.linspace(0.1, 10, 10_000)
py_result = gauss_py(a)
print(py_result)
print(np.sqrt(np.pi / a))
Much better! We've now avoided calling Python at all once we enter the integrate loop. This should perform close to a full Fortran or C implementation, and it just took adding 2-3 lines of code.
Included as an example. Don't do it this way. Just don't.
This is the signature we expect:
double func(double x, void *user_data)
@numba.cfunc(numba.double(numba.double, numba.types.voidptr))
def integrand(x, user_ptr):
(a,) = numba.carray(user_ptr, (1,), np.double)
return np.exp(-a * x**2)
a_array = np.array([0.0])
c = scipy.LowLevelCallable(integrand.ctypes, a_array.ctypes.data_as(ctypes.c_void_p))
@np.vectorize
def gauss_py(a):
a_array[0] = a
y, abserr = scipy.integrate.quad(c, -np.inf, np.inf)
return y
%%time
a = np.linspace(0.1, 10, 10_000)
py_result = gauss_py(a)
print(py_result)
print(np.sqrt(np.pi / a))
Note to self: you can get the address of the callable directly from the ctypes object with:
ctypes.cast(integrand.ctypes, ctypes.c_void_p)