Profiling and Optimization Hands-On¶

In [ ]:

import numpy as np
from math import sin, cos
import matplotlib.pyplot as plt
from time import sleep
%matplotlib inline

In [ ]:

%%sh

mamba install memory_profiler line_profiler

A common problem: Which implementation is faster?¶

Example: Given a large 2D array, we will explore different ways to create the array and to calculate its mean. Determine which one is fastest, using the %timeit notebook function.

setup: define our 2D array:¶

We'll make some dummy test data that looks like:

$M_{ij} = \sin(i)\cos(0.1 j)$

and we will construct this array in multiple ways.

In [ ]:

def create_array_loop(N,M):
    arr = []
    for y in range(M):
        row = []
        for x in range(N):
            row.append(sin(x)*cos(0.1*y))
        arr.append(row)
    return arr

def create_array_list(N,M):
    """a 2D array using a list-comprehension"""
    return [[sin(x)*cos(0.1*y) for x in range(N)] for y in range(M)]  

def create_array_np(N,M):
    """ a 2D array using numpy"""
    X,Y = np.meshgrid(np.arange(N), np.arange(M))
    return np.sin(X)*np.cos(0.1*Y)

Let's first just plot the arrays, to see if they are the same:

In [ ]:

N=30; M=10  # our array dimensions
plt.figure(figsize=(12,5))
plt.subplot(1,3,1)
plt.imshow( create_array_loop(N,M))
plt.subplot(1,3,2)
plt.imshow( create_array_list(N,M))
plt.subplot(1,3,3)
plt.imshow( create_array_np(N,M))

Task: determine which array creation routine is fastest¶

And make a plot of the speed of each! Does the result change much when the array size becomes larger? Try much larger sizes for N and M

Hint: use the %timeit -o magic function to have %timeit return results (see the timeit help)

SOLUTION:¶

Note: to minimize the uncertainty, try increasing the problem size!

In [ ]:

N=30; M=10

Useful tip: timeit has a -o option that lets you save the results for later comparison!

In [ ]:

t = {}  # a place to record out time statistics

In [ ]:

t['loop'] = %timeit -o create_array_loop(N,M)

In [ ]:

t['list'] = %timeit -o create_array_list(N,M)

In [ ]:

t['numpy']  = %timeit -o create_array_np(N,M)

In [ ]:

def plot_performance(time_dict):
    mean = [t.average for t in time_dict.values()]
    std = [t.stdev for t in time_dict.values()]
    x = range(len(time_dict))
    plt.errorbar(x, mean, yerr=std, lw=3, fmt="o") 
    plt.xticks( np.arange(len(time_dict)), time_dict.keys())
    plt.ylabel("average execution time (s)")
    plt.grid()

In [ ]:

plot_performance(t)

Task 2: determine the fastest way to find the mean of our array¶

note that create_array_list() and create_array_loop both return a list-of-lists, while create_array_np returns a 2D numpy array. There are multiple ways to compute the mean of these arrays. See again which is fastest!

try at least:

using the built-in python sum function and either a for-loop or list-comprehension
using pure numpy (e.g. array.mean())
other ways you can think of!

SOLUTION¶

In [ ]:

N=100; M=100
a_list = create_array_list(N,M)
a_np = create_array_np(N,M)

In [ ]:

sum([sum(x) for x in a_list])/(N*M)

In [ ]:

a_np.mean()

In [ ]:

t2 = {}
t2["numpy a.mean"] = %timeit -o a_np.mean()
t2["numpy mean(a)"] = %timeit -o np.mean(a_np)
t2["sum(a)"] = %timeit -o sum([sum(x) for x in a_list])/(N*M)
plot_performance(t2)

Using a profiler¶

A profiler gives you speed measurements of all functions in your code at once (and all their dependencies)

In [ ]:

def slow_function(x):
    sleep(1)
    return x**2

def faster_function(x):
    return x**2

In [ ]:

%%prun -r

for ii in range(5):
    x = slow_function(ii)
    y = faster_function(ii)

In [ ]:

stats = _

In [ ]:

stats.sort_stats("call").print_stats()

What about line profiling?¶

The syntax is:

%lprun -f <function(s)>  <python statement that uses the function>

In [ ]:

%load_ext line_profiler

In [ ]:

def run_all():
    for ii in range(5):
        x = slow_function(ii)
        y = faster_function(ii)

In [ ]:

%lprun -f run_all run_all()

Try adding also -f slow_function to see inside that function as well!

In [ ]:

%lprun -f create_array_loop create_array_loop(100,100)

Memory Profiling¶

In [ ]:

import numpy as np
from math import sin, cos

In [ ]:

%load_ext memory_profiler

In [ ]:

%memit np.sum(np.sin(np.arange(1000000)))

In [ ]:

%memit sum(sin(x) for x in range(1000000))

One problem: in a notebook, this is measuring total memory, not just for the function being run, and is affected by garbage collection. Try instead making modules:

In [ ]:

%%writefile  tmp_sum.py

import numpy as np
import math

def do_some_sums(n=100_000):        
    sum1 = np.sum(np.sin(np.arange(n)))
    sum2 = sum(math.sin(x) for x in range(n))
    return sum1+sum2
    

In [ ]:

from tmp_sum import do_some_sums

In [ ]:

%mprun -f do_some_sums do_some_sums(500_000)

In [ ]: