In [1]:
import time
print('Last updated: %s' %time.strftime('%d/%m/%Y'))
Last updated: 24/06/2014


I would be happy to hear your comments and suggestions.
Please feel free to drop me a note via twitter, email, or google+.


Day 7 - One Python Benchmark per Day

Speeding up NumPy array expressions with Numexpr

Numexpr is a "fast numerical expression evaluator for NumPy". It has a internal Virtual Machine to rewrite code more efficiently by breaking down operations into smaller chunks for the CPU cache. By using a JIT (just-in-time compiler), compilation at runtime is not necessary.

As listed in the documentation, it currently supports the following operators:

Logical operators: &, |, ~
Comparison operators: <, <=, ==, !=, >=, >
Unary arithmetic operators: -
Binary arithmetic operators: +, -, *, /, **, %, <<, >>

The usage is quite simple, we just have to provide the numerical expression that we want to evaluate as a string to the evaluate function, e.g.,

import numexpr
import numpy
a = numpy.array([[1, 2]])
numexpr.evaluate('a ** 2')
>> array([[1, 4]], dtype=int64)

For this benchmark, we will compare different arithmetic operators for simple matrix calculations using NumPy arrays to their Numexpr equivalents.

In [2]:
import numpy as np
from numexpr import evaluate



NumPy functions

In [3]:
def numpy_modulus(A, B):
    return A % B

def numpy_difference(A, B):
    return A - B

def numpy_multiplication(A, B):
    return A * B

def numpy_division(A, B):
    return A / B

def numpy_power(A, B):
    return A ** B

def numpy_squareroot(A, B=None):
    return np.sqrt(A)

def numpy_sum(A, B=None):
    return np.sum(A)

def numpy_log(A, B=None):
    return np.log(A)

def numpy_logic_operator(A, B):
    return A < B

def numpy_complex_expr(A, B):
    return(A*B-4.1*A > 2.5*B)



Numexpr equivalents

In [4]:
def numexpr_modulus(A, B):
    return evaluate('A % B')

def numexpr_difference(A, B):
    return evaluate('A - B')

def numexpr_multiplication(A, B):
    return evaluate('A * B')

def numexpr_division(A, B):
    return evaluate('A / B')

def numexpr_power(A, B):
    return evaluate('A ** B')

def numexpr_squareroot(A, B=None):
    return evaluate('sqrt(A)')

def numexpr_sum(A, B=None):
    return evaluate('sum(A)')

def numexpr_log(A, B=None):
    return evaluate('log(A)')

def numexpr_logic_operator(A, B):
    return evaluate('A < B')

def numexpr_complex_expr(A, B):
    return evaluate('A*B-4.1*A > 2.5*B')



Timing

In [6]:
import timeit

orders_n = [10**n for n in range(1, 5)]

funcs = ['modulus', 'difference', 'multiplication', 
         'division', 'power', 'log', 'sum', 'squareroot',
         'logic_operator', 'complex_expr']

timings_np = {f:[] for f in funcs}
timings_ne = {f:[] for f in funcs}

for n in orders_n:
    for f in funcs:
        A = np.random.rand(n,n)
        B = np.random.rand(n,n)
        timings_np[f].append(min(timeit.Timer('numpy_%s(A, B)' %f, 
                      'from __main__ import A, B, numpy_%s' %f)
                              .repeat(repeat=3, number=1)))
        timings_ne[f].append(min(timeit.Timer('numexpr_%s(A, B)' %f, 
                      'from __main__ import A, B, numexpr_%s' %f)
                              .repeat(repeat=3, number=1)))



Setting up plots

In [17]:
import platform
import datetime
import numexpr
import multiprocessing


def print_sysinfo():
    print('\ndate:    :', str(datetime.date.today()))
    print('\nsystem   :', platform.system())
    print('release  :', platform.release())
    print('machine  :', platform.machine())
    print('processor:', platform.processor())
    print('CPU count  :', multiprocessing.cpu_count())
    print('interpreter:', platform.architecture()[0])

    print('\nPython version', platform.python_version())
    print('compiler', platform.python_compiler())
    print('NumPy version', np.__version__)
    print('Numexpr version', numexpr.__version__)
    print('\n\n')
In [18]:
import prettytable

def summary_table(funcs):
    fit_table = prettytable.PrettyTable(['n=%s' %orders_n[-1], 
                                         'Numexpr function' ,
                                         'rel. performance gain via Numexpr'])
    fit_table.align['Numexpr function'] = 'l'
    for f in funcs:
        fit_table.add_row(['', f, '{:.2f}x'.format(timings_np[f][-1]/timings_ne[f][-1])])
    print(fit_table)
In [19]:
%matplotlib inline
In [22]:
import matplotlib.pyplot as plt

def plot_figures(funcs): 
    fig = plt.figure(figsize=(12,20))
    for i,f in enumerate(funcs):
        plt.subplot(len(funcs)/2, 2, i)
        plt.plot([i**2 for i in orders_n], [i*100 for i in timings_np[f]], 
                            alpha=0.5, label='NumPy %s' %f, marker='o')
        plt.plot([i**2 for i in orders_n], [i*100 for i in timings_ne[f]], 
                            alpha=0.5, label='Numexpr %s' %f, marker='o')
        plt.legend(loc='upper left')
        plt.xscale('log')
        plt.yscale('log')
        plt.grid()
        plt.xlabel('number of elements per matrix')
        plt.ylabel('time in milliseconds')    
    plt.show()



Results

In [23]:
print_sysinfo()
summary_table(funcs)
plot_figures(funcs)
date:    : 2014-06-24

system   : Darwin
release  : 13.2.0
machine  : x86_64
processor: i386
CPU count  : 4
interpreter: 64bit

Python version 3.4.1
compiler GCC 4.2.1 (Apple Inc. build 5577)
NumPy version 1.8.1
Numexpr version 2.3.1



+---------+------------------+-----------------------------------+
| n=10000 | Numexpr function | rel. performance gain via Numexpr |
+---------+------------------+-----------------------------------+
|         | modulus          |               3.16x               |
|         | difference       |               1.32x               |
|         | multiplication   |               1.28x               |
|         | division         |               1.43x               |
|         | power            |               2.23x               |
|         | log              |               2.27x               |
|         | sum              |               0.90x               |
|         | squareroot       |               1.39x               |
|         | logic_operator   |               1.19x               |
|         | complex_expr     |              593.85x              |
+---------+------------------+-----------------------------------+

Since numexpr makes use of multi-threading (thereby bypassing Python's GIL - Global Interpreter Lock) we can see an increase in performance for the especially CPU-heavy tasks when we are using numexpr, i.e., for larger matrix sizes and especially for more complex vectorized expressions (see the plot in the lower left).

In [ ]: