Numba is designed to be used with NumPy and complement its capabilities. Numba supports:
import numpy as np import numba from numba import jit
Numba automatically uses multiple dispatch on compiled functions to allow different specialized implementations of the same function to be used. Suppose we have a function that clamps values to zero if they are below a particular magnitude:
@jit(nopython=True) def zero_clamp(x, threshold): # assume 1D array. See later in this notebook for more general function out = np.empty_like(x) for i in range(out.shape): if np.abs(x[i]) > threshold: out[i] = x[i] else: out[i] = 0 return out
a_small = np.linspace(0, 1, 50) zero_clamp(a_small, 0.3)
Now let's benchmark some different kinds of array inputs. We'll try:
n = 10000 a_int16 = np.arange(n).astype(np.int16) a_float32 = np.linspace(0, 1, n, dtype=np.float32) a_float32_strided = np.linspace(0, 1, 2*n, dtype=np.float32)[::2] # view of every other element
%timeit zero_clamp(a_int16, 1600) %timeit zero_clamp(a_float32, 0.3) %timeit zero_clamp(a_float32_strided, 0.3)
We see different performance characteristics for each of these cases, even though they have the same number of input elements. Numba generated different machine code for each situation, which we can see if we look at the
.signatures attribute of the compiled function:
When printed as strings, Numba array types have the form:
array(dtype, dimensions, layout). The first signature therefore corresponds to a 1D array of float64 with C style layout (row-major order, no gaps between elements). The next two signatures are similar, but for
float32 arrays. The final signature indicates an "any" layout array, which usually happens when you slice an array, and it no longer has a C or FORTRAN memory layout.
We can compare to a pure NumPy implementation and see the speed improvement that Numba has achieved through a combination of specialization and elimination of temporary arrays:
def np_zero_clamp(x, threshold): return np.where(np.abs(x) > threshold, x, 0)
%timeit np_zero_clamp(a_int16, 1600) %timeit np_zero_clamp(a_float32, 0.3) %timeit np_zero_clamp(a_float32_strided, 0.3)
Universal functions, typically called "ufuncs" for short, are functions that broadcast an elementwise operation across input arrays of varying numbers of dimensions. Most NumPy functions are ufuncs, and Numba makes it easy to compile custom ufuncs using the
from numba import vectorize
@vectorize(nopython=True) def ufunc_zero_clamp(x, threshold): if np.abs(x) > threshold: return x else: return 0
%timeit ufunc_zero_clamp(a_int16, 1600) %timeit ufunc_zero_clamp(a_float32, 0.3) %timeit ufunc_zero_clamp(a_float32_strided, 0.3)
Note that for this simple ufunc, Numba is not as fast as the function with the manual looping, and in some cases, is the same speed as the example that called NumPy directly. This is not surprising as this function is very simple, and NumPy also uses compiled ufuncs. Numba
@vectorize is generally most effective when creating ufuncs that are not a simple combination of existing NumPy operations.
Numba supports many, but not all, NumPy functions. Some functions also have limitations that prevent the use of some of the optional arguments in nopython mode. A full description can be found in the Supported NumPy Features page in the Numba Reference Manual.
Note that when using NumPy functions on arrays, Numba will also compile and optimize array expressions:
def numpy_mpe(x, true): return (((x - true)/true)**2).mean() numba_mpe = jit(nopython=True)(numpy_mpe) # using jit as a function rather than a decorator
We can confirm both versions give the same answer:
true_x = 0.1 x = np.random.normal(true_x, 1, size=100000) numpy_mpe(x, true=true_x), numba_mpe(x, true=true_x)
And see the Numba version is faster:
%timeit numpy_mpe(x, true=0.1) %timeit numba_mpe(x, true=0.1)
scipy package is installed, Numba will also automatically make use of the optimized BLAS/LAPACK implementation that SciPy was compiled with. In the case of Anaconda, this is Intel MKL, but OpenBLAS is also common for builds of
scipy. (Note that Numba is not itself compiled and linked against any BLAS implementation.) Most functions in
numpy.linalg will be accelerated this way, as well as
Numba will not run any faster than NumPy for individual linear algebra routines (since both translate to calls to the same underlying library), but you are able to use linear algebra calls inside your Numba-compiled functions without any loss of performance.