%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
This is a fun problem, for several reasons:
Let's import some libraries. Note, to automatically see plots, sometimes you may have to do:
%matplotlib inline
(or notebook
, widget
) - for the recommended setup, you should be fine without these.
import matplotlib.pyplot as plt
# Extra performance libraries for later
# import numexpr
import numba
import numpy as np
You can generate a Mandelbrot fractal by applying the transform:
$$ z_{n+1}=z_{n}^{2}+c $$repeatedly to a regular matrix of complex numbers $c$, and recording the iteration number where the value $|z|$ surpassed some bound, usually 2. You start at $z_0 = c$.
Let's set up some initial parameters and a helper matrix:
1j + 1
maxiterations = 50
# 300 x 400 matrix of complex numbers from [-1.5j, 1.5j] x [-2, 2]
c = np.sum(np.broadcast_arrays(*np.ogrid[-1.5j:1.5j:300j, -2:2:400j]), axis=0)
Note that in the interest of absolute brevity, I've taken advantage of the fact
ogrid
works with complex numbers; however,mgrid
does not.ogrid
is faster anyway.
Let's make sure we have the correct c
:
fig, axs = plt.subplots(1, 2, figsize=(6, 3))
axs[0].pcolormesh(c.real, c.imag, c.real)
axs[1].pcolormesh(c.real, c.imag, c.imag);
Okay, let's make the fractal as simply as possible in numpy:
fractal = np.zeros_like(c, dtype=np.int32)
z = c.copy()
for i in range(1, maxiterations + 1):
z = z**2 + c # Compute z
diverge = abs(z) > 2 # Divergence criteria
z[diverge] = 2 # Keep number size small
fractal[~diverge] = i # Fill in non-diverged iteration number
fig, ax = plt.subplots(figsize=(4, 3))
ax.pcolormesh(c.real, c.imag, fractal);
In a notebook, you often start with raw code (like above) for easy debugging, but once it works, you put in in a function, like the function below:
def fractal_numpy(c, maxiterations):
f = np.zeros_like(c, dtype=np.int32)
z = c.copy()
for i in range(1, maxiterations + 1):
z = z * z + c # Compute z
diverge = np.abs(z**2) > 2**2 # Divergence criteria
z[diverge] = 2 # Keep number size small
f[~diverge] = i # Fill in non-diverged iteration number
return f
%%timeit
fractal_numpy(c, maxiterations)
While we wouldn't really do this normally, expecting it to be much slower, here is the pure Python version:
def fractal_pure(c, maxiterations):
fractal = np.zeros_like(c, dtype=np.int32)
for yi in range(c.shape[0]):
for xi in range(c.shape[1]):
z = cxy = c[yi, xi]
for i in range(1, maxiterations + 1):
z = z**2 + cxy
if abs(z) > 2:
break
fractal[yi, xi] = i
return fractal
%%timeit
fractal_pure(c, maxiterations)
I don't know about you, but that was much faster than I would have naively expected. Why? What is different about the algorithm?
For later use, and for better design, let's break up the above function into to pieces; the "on each" part and the part that applies it to each element (vectorization).
def on_each_python(cxy, maxiterations):
z = cxy
for i in range(1, maxiterations + 1):
z = z * z + cxy
if abs(z) > 2:
return i
return i
def fractal_python(c, maxiterations):
fractal = np.zeros_like(c, dtype=np.int32)
for yi in range(c.shape[0]):
for xi in range(c.shape[1]):
fractal[yi, xi] = on_each_python(c[yi, xi], maxiterations)
return fractal
%%timeit
fractal_python(c, maxiterations)
While we'll look at something much more interesting soon, here is NumPy's vectorize. This is not supposed to do much except replace the outer function we had to manually define (though I've actually found it to be noticeably faster).
fractal_npvectorize = np.vectorize(on_each_python)
%%timeit
fractal_npvectorize(c, maxiterations)
We can plot any of these to make sure they work:
fractal = fractal_npvectorize(c, maxiterations)
fig, ax = plt.subplots(figsize=(4, 3))
ax.pcolormesh(c.real, c.imag, fractal)
Never optimize until you have profiled! If code becomes uglier/harder to maintain, you must have a solid reason for doing so.
Let's look at the line_profiler
package, which has fairly nice IPython magic support. First let's load the magic:
%load_ext line_profiler
Now, we run the magic with -f function_to_profile
and the command to profile. Only the lines of the function we specify will show up:
%lprun -f fractal_numpy fractal_numpy(c, maxiterations)
If you don't have external packages available, the built-in cProfile
is also usable, though not as pretty.
Note¶
Most standard library modules with names like
something, cSomething
were merged in Python 3, with the faster compiled implementation being selected automatically. This one was not, sincecProfile
andprofile
are not quite identical.profile
is much slower.
import cProfile
cProfile.run("fractal_numpy(c, maxiterations)", sort=2)
(Note: Numba takes full advantage of the instruction set on your system, since it does not expect to be compiled and run on a different machine; thus often Numba will be faster than normally compiled code).
How can we make NumPy faster? Expressions are slow in NumPy because they usually create lots of temporary arrays, and memory allocations are costly. To avoid this, you could manually reuse memory, but this would require lots of ugly rewriting, such as taking a + b + c
and writing t = a + b; b += c
.
Starting with NumPy 1.13, some simple expressions like the one above, will avoid making memory copies (generally on Unix only)
There's a second issue; even with avoiding unneeded temporaries, you still have to run multiple kernels (computation functions) - it would be nicer if you could just do the full calculation on each input and produce one output, with no in-between steps.
One way to do this is with numexpr. This is an odd little library that can compile small expressions just-in-time (JIT). Here's what it looks like in practice:
import numexpr
a, b = np.random.random_sample(size=(2, 100_000))
print("classic", 2 * a + 3 * b)
print("numexpr", numexpr.evaluate("2 * a + 3 * b"))
%%timeit
c = 2 * a + 3 * b # Try 2 * a**2 + 3 * b**3
%%timeit
c = numexpr.evaluate("2 * a + 3 * b")
However, numexpr is very limited. It has a small set of data types, a small collection of supported operators and basic functions, and works one-line-at a time. You can make it less magical with feed dictionaries if you want.
If that managed to whet your appitite, let's look at Numba - a Python to LLVM JIT compiler! We'll see it again, but for now, here's a little demo:
@numba.vectorize
def f(a, b):
return 2 * a + 3 * b
f
%%time
c = f(a, b)
%%timeit
c = f(a, b)
It took the function we defined, pulled it apart, and turned into Low Level Virtual Machine (LLVM) code, and compiled it. No special strings or special syntax; it is just a (large) subset of Python and NumPy. And users and libraries can extend it too. It also supports:
It is almost always as fast or faster than any other compiled solution (minus the JIT time). A couple of years ago it became much easier to install (via PIP with LLVMLite's lightweight and independent LLVM build).
Try some of the following:
numexpr
to replace parts of the above calculation. Why is this not very effective?@numba.njit
@numba.vectorize
Further reading: