%load_ext watermark
%watermark -d -v -u -t -z -p numpy
Last updated: 05/07/2014 18:39:56 EDT CPython 3.4.1 IPython 2.1.0 numpy 1.8.1
More information about the watermark
magic command extension.
There are at least two ways to stack NumPy arrays vertically (row-wise), either via numpy.concatenate(tup, axis=0)
, or by the more specific numpy.vstack(tup)
function. Although the NumPy documentations claims that they are equivalent, there are rumors that numpy.concatenate
is the faster approach.
The same applies to numpy.hstack
vs np.concatenate(tup, axis=1)
for stacking arrays column-wise (vertically), and there is a third way, numpy.append(a, b)
.
Let's see if those rumors are true...
Before we do the actual benchmarks, let us quickly check that those methods are indeed similar results:
import numpy as np
# Vertical (row-wise) stacking
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[9,8,7],[7,8,9]])
print(np.concatenate((a,b)), end='\n\n')
print(np.vstack((a,b)))
[[1 2 3] [4 5 6] [9 8 7] [7 8 9]] [[1 2 3] [4 5 6] [9 8 7] [7 8 9]]
# Horizontal (column-wise) stacking
a = np.array([1,2,3])
b = np.array([4,5,6])
print(np.concatenate((a,b), axis=1), end='\n\n')
print(np.hstack((a,b)), end='\n\n')
print(np.append(a,b))
[1 2 3 4 5 6] [1 2 3 4 5 6] [1 2 3 4 5 6]
timeit
¶import timeit
from numpy import append as np_append
from numpy import concatenate as np_concatenate
from numpy import hstack as np_hstack
from numpy import vstack as np_vstack
funcs = ('np_append', 'np_concatenate', 'np_hstack', 'np_linalg_norm')
t_append, t_hconc, t_vconc, t_hstack, t_vstack = [], [], [], [], []
orders_5 = [10**i for i in range(1, 5)]
for n in orders_5:
nxn_dim = np.random.randn(n,n)
t_vconc.append(min(timeit.Timer('np_concatenate((nxn_dim, nxn_dim))',
'from __main__ import nxn_dim, np_concatenate').repeat(repeat=5, number=1)))
t_vstack.append(min(timeit.Timer('np_vstack((nxn_dim, nxn_dim))',
'from __main__ import nxn_dim, np_vstack').repeat(repeat=5, number=1)))
orders_6 = [10**i for i in range(1, 6)]
for n in orders_6:
nx1_dim = np.random.randn(n,1)
t_append.append(min(timeit.Timer('np_append(nx1_dim, nx1_dim)',
'from __main__ import nx1_dim, np_append').repeat(repeat=5, number=1)))
t_hconc.append(min(timeit.Timer('np_concatenate((nx1_dim, nx1_dim), axis=1)',
'from __main__ import nx1_dim, np_concatenate').repeat(repeat=5, number=1)))
t_hstack.append(min(timeit.Timer('np_hstack((nx1_dim, nx1_dim))',
'from __main__ import nx1_dim, np_hstack').repeat(repeat=5, number=1)))
%matplotlib inline
from matplotlib import pyplot as plt
def plot():
def settings():
plt.xlim([min(orders_6) / 10, max(orders_6)* 10])
plt.legend(loc=2, fontsize=14)
plt.grid()
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.xscale('log')
plt.yscale('log')
plt.legend(loc=2, fontsize=14)
fig = plt.figure(figsize=(15,8))
plt.subplot(1,2,1)
plt.plot(orders_5, t_vconc, alpha=0.7, label='np.concatenate((a,b))')
plt.plot(orders_5, t_vstack, alpha=0.7, label='np.vstack((a,b))')
plt.xlabel(r'sample size $n$ ($n\times \, n$ NumPy array)', fontsize=14)
plt.ylabel('time per computation in seconds', fontsize=14)
plt.title('Vertical stacking of NumPy arrays (row wise)', fontsize=14)
settings()
plt.subplot(1,2,2)
plt.plot(orders_6, t_hconc, alpha=0.7, label='np.concatenate((a,b), axis=1)')
plt.plot(orders_6, t_hstack, alpha=0.7, label='np.hstack((a,b))')
plt.plot(orders_6, t_append, alpha=0.7, label='np.append(a,b)')
plt.xlabel(r'sample size $n$ ($n\times \, 1$ NumPy array)', fontsize=14)
plt.ylabel('time per computation in seconds', fontsize=14)
plt.title('Horizontal stacking of NumPy arrays (column wise)', fontsize=14)
settings()
plt.tight_layout()
plt.show()
plot()
%watermark
05/07/2014 19:18:43 CPython 3.4.1 IPython 2.1.0 compiler : GCC 4.2.1 (Apple Inc. build 5577) system : Darwin release : 13.2.0 machine : x86_64 processor : i386 CPU cores : 2 interpreter: 64bit
The plots above are indicating that the concatenate
functions are indeed faster to call for small sample sizes. However, large arrays are the ones where performance really matters, and we can see that the other functions are catching up performance-wise with increasing array sizes.