In [1]:
import time
print('Last updated: %s' %time.strftime('%d/%m/%Y'))

Last updated: 24/06/2014


# Day 15 - One Python Benchmark per Day¶

## Array indexing in NumPy: Extracting rows and columns¶

There are multiple ways in NumPy to extract particular rows or columns of interest from a NumPy array.

For example, given a 3x3-dimensional matrix A as shown below,

In [2]:
import numpy as np
A = np.array([[1,2,3],[4,5,6], [7,8,9]])
print(A)

[[1 2 3]
[4 5 6]
[7 8 9]]


we now want to extract the first and the last column. The most intuitive way would probably be by direct indexing:

In [3]:
print(A[:,[0,-1]])

[[1 3]
[4 6]
[7 9]]


But alternatively, we could also have used the numpy.take function here:

In [4]:
print(np.take(A, indices=[0,-1], axis=1))

[[1 3]
[4 6]
[7 9]]


Equivalently, if we wanted to extract the first and last row:

In [5]:
print(A[[0,-1],:])

[[1 2 3]
[7 8 9]]

In [6]:
print(np.take(A, indices=[0,-1], axis=0))

[[1 2 3]
[7 8 9]]


Both approaches return a new array object, not just a view/reference to an existing array, which will be demonstrated in the following example:

In [7]:
print('A before:\n', A)

B = A[[0,-1],:]
B[0,0] = 100

C = np.take(A, indices=[0,-1], axis=0)
C[0,0] = 150

print('\nB:\n', B)
print('\nC:\n', C)

print('\nA after:\n', A)

A before:
[[1 2 3]
[4 5 6]
[7 8 9]]

B:
[[100   2   3]
[  7   8   9]]

C:
[[150   2   3]
[  7   8   9]]

A after:
[[1 2 3]
[4 5 6]
[7 8 9]]


Note

C = np.take(A, indices=[0,-1], axis=0)
is equivalent to
np.take(A, indices=[0,-1], axis=0, out=C)
However, the former approach is the slightly faster one:

In [8]:
%timeit C = np.take(A, indices=[0,-1], axis=0)
%timeit np.take(A, indices=[0,-1], axis=0, out=C)

100000 loops, best of 3: 3.79 µs per loop
100000 loops, best of 3: 5.47 µs per loop


# timeit benchmarks¶

In the following benchmark, we will see how fast the two approaches are in returning the first column/row, a random column/row, and the last column/row of 2-dimensional arrays (in terms of their axes) consisting NxN elements.

In [11]:
import timeit
from numpy import take as np_take

orders_n = [10**i for i in range(1, 5)]

idx_slice_col, idx_slice_row, np_take_col, np_take_row = [], [], [], []

for n in orders_n:

rand_idx = np.random.randint(n)
A = np.random.randn(n, n)
idx_slice_col.append(min(timeit.Timer('A[:, [0, rand_idx, -1]]',
'from __main__ import A, rand_idx').repeat(repeat=3, number=1)))
np_take_col.append(min(timeit.Timer('np_take(A, indices=[0, rand_idx, -1], axis=1)',
'from __main__ import A, rand_idx, np_take').repeat(repeat=3, number=1)))
idx_slice_row.append(min(timeit.Timer('A[[0, rand_idx, -1], :]',
'from __main__ import A, rand_idx').repeat(repeat=3, number=1)))
np_take_row.append(min(timeit.Timer('np_take(A, indices=[0, rand_idx, -1], axis=0)',
'from __main__ import A, rand_idx, np_take').repeat(repeat=3, number=1)))


## Preparing to plot the results¶

In [12]:
import platform
import multiprocessing

def print_sysinfo():

print('\nPython version  :', platform.python_version())
print('compiler        :', platform.python_compiler())
print('Numpy version   :', np.__version__)

print('\nsystem     :', platform.system())
print('release    :', platform.release())
print('machine    :', platform.machine())
print('processor  :', platform.processor())
print('CPU count  :', multiprocessing.cpu_count())
print('interpreter:', platform.architecture()[0])
print('\n\n')

In [13]:
%matplotlib inline

In [18]:
import matplotlib.pyplot as plt

def plot():

f, ax = plt.subplots(1, 2, figsize=(15,5))

ax[0].plot(orders_n, idx_slice_col, alpha=0.4, lw=3, marker='o',
label='Classic indexing\nA[:, [0, rand_idx, -1]]')
ax[0].plot(orders_n, np_take_col, alpha=0.4, lw=3, marker='o',
label='NumPy.take\nnp.take(A, indices=[0, rand_idx, -1], axis=1)')
ax[0].set_title('Getting columns from a NumPy array')

ax[1].plot(orders_n, idx_slice_row, alpha=0.4, lw=3, marker='o',
label='Classic indexing\nA[[0, rand_idx, -1], :]')
ax[1].plot(orders_n, np_take_row, alpha=0.4, lw=3, marker='o',
label='NumPy.take\nnp_take(A, indices=[0, rand_idx, -1], axis=0)')
ax[1].set_title('Getting rows from a NumPy array')

for x in ax:
x.legend(loc='upper left')
x.set_ylabel('time in seconds')
x.set_xlabel('NumPy array size')

plt.tight_layout()
plt.show()


# Results¶

In [19]:
plot()
print_sysinfo()

Python version  : 3.4.1
compiler        : GCC 4.2.1 (Apple Inc. build 5577)
Numpy version   : 1.8.1

system     : Darwin
release    : 13.2.0
machine    : x86_64
processor  : i386
CPU count  : 4
interpreter: 64bit