Small experiment to see how to easily profile some pure Python and Cython-enhanced Python code, for both time and memory, directly within the Jupyter notebook.
%load_ext watermark
%watermark -v -m -g
CPython 3.5.3 IPython 6.0.0 compiler : GCC 6.3.0 20170118 system : Linux release : 4.10.0-21-generic machine : x86_64 processor : x86_64 CPU cores : 4 interpreter: 64bit Git hash : 0f7c8d70a89e5bc90866eb42b38a01b76665debd
def recip_square(i):
return 1./i**2
def approx_pi(n=10000000):
val = 0.
for k in range(1,n+1):
val += recip_square(k)
return (6 * val)**.5
%time approx_pi()
CPU times: user 4 s, sys: 0 ns, total: 4 s Wall time: 4.02 s
3.1415925580959025
%timeit approx_pi()
4.26 s ± 45.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Thanks to https://nbviewer.jupyter.org/gist/minrk/7715212
from __future__ import print_function
from IPython.core import page
def myprint(s):
try:
print(s['text/plain'])
except (KeyError, TypeError):
print(s)
page.page = myprint
%prun approx_pi()
10000004 function calls in 4.972 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 10000000 3.280 0.000 3.280 0.000 <ipython-input-94-8fdaa89ffea3>:1(recip_square) 1 1.692 1.692 4.972 4.972 <ipython-input-94-8fdaa89ffea3>:4(approx_pi) 1 0.000 0.000 4.972 4.972 {built-in method builtins.exec} 1 0.000 0.000 4.972 4.972 <string>:1(<module>) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
After installing line_profiler
and enabling the IPython magic, we can use it
%lprun -T /tmp/test_lprun.txt -f recip_square -f approx_pi approx_pi()
%cat /tmp/test_lprun.txt
*** Profile printout saved to text file '/tmp/test_lprun.txt'. Timer unit: 1e-06 s Total time: 5.88048 s File: <ipython-input-94-8fdaa89ffea3> Function: recip_square at line 1 Line # Hits Time Per Hit % Time Line Contents ============================================================== 1 def recip_square(i): 2 10000000 5880485 0.6 100.0 return 1./i**2 Total time: 20.6013 s File: <ipython-input-94-8fdaa89ffea3> Function: approx_pi at line 4 Line # Hits Time Per Hit % Time Line Contents ============================================================== 4 def approx_pi(n=10000000): 5 1 2 2.0 0.0 val = 0. 6 10000001 3911824 0.4 19.0 for k in range(1,n+1): 7 10000000 16689447 1.7 81.0 val += recip_square(k) 8 1 4 4.0 0.0 return (6 * val)**.5
With the memory profiler:
%load_ext memory_profiler
The memory_profiler extension is already loaded. To reload it, use: %reload_ext memory_profiler
%mprun -T /tmp/test_mprun.txt -f recip_square -f approx_pi approx_pi()
%cat /tmp/test_mprun.txt
ERROR: Could not find file <ipython-input-94-8fdaa89ffea3> NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment. ERROR: Could not find file <ipython-input-94-8fdaa89ffea3> NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment. *** Profile printout saved to text file /tmp/test_mprun.txt.
%memit approx_pi()
peak memory: 77.60 MiB, increment: 0.00 MiB
%load_ext cython
The cython extension is already loaded. To reload it, use: %reload_ext cython
%%cython -3
# cython: profile=True
def recip_square2(int i):
return 1./i**2
def approx_pi2(int n=10000000):
cdef double val = 0.
cdef int k
for k in range(1, n + 1):
val += recip_square2(k)
return (6 * val)**.5
%timeit approx_pi()
%timeit approx_pi2()
3.94 s ± 63.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 610 ms ± 3.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%prun approx_pi2()
10000005 function calls in 2.143 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 1.658 1.658 2.143 2.143 _cython_magic_354fd18f414700095ab07463dec470f5.pyx:6(approx_pi2) 10000000 0.485 0.000 0.485 0.000 _cython_magic_354fd18f414700095ab07463dec470f5.pyx:3(recip_square2) 1 0.000 0.000 2.143 2.143 <string>:1(<module>) 1 0.000 0.000 2.143 2.143 {built-in method builtins.exec} 1 0.000 0.000 2.143 2.143 {_cython_magic_354fd18f414700095ab07463dec470f5.approx_pi2} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
And with just a simple change, by typing the intermediate function:
%%cython -3
# cython: profile=True
cdef double recip_square3(int i):
return 1./(i**2)
def approx_pi3(int n=10000000):
cdef double val = 0.
cdef int k
for k in range(1, n + 1):
val += recip_square3(k)
return (6 * val)**.5
%timeit approx_pi3()
31.5 ms ± 890 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%prun approx_pi3()
10000005 function calls in 1.451 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.953 0.953 1.451 1.451 _cython_magic_e59e914dbc65ad39e8eabcce1f3dd74a.pyx:6(approx_pi3) 10000000 0.497 0.000 0.497 0.000 _cython_magic_e59e914dbc65ad39e8eabcce1f3dd74a.pyx:3(recip_square3) 1 0.000 0.000 1.451 1.451 <string>:1(<module>) 1 0.000 0.000 1.451 1.451 {built-in method builtins.exec} 1 0.000 0.000 1.451 1.451 {_cython_magic_e59e914dbc65ad39e8eabcce1f3dd74a.approx_pi3} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
And with an inlined function?
%%cython -3
cdef inline double recip_square4(int i):
return 1./(i**2)
def approx_pi4(int n=10000000):
cdef double val = 0.
cdef int k
for k in range(1, n + 1):
val += recip_square4(k)
return (6 * val)**.5
%timeit approx_pi4()
13.7 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%prun approx_pi4()
4 function calls in 0.015 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.014 0.014 0.014 0.014 {_cython_magic_9d338490eeecee00b47e73b849343da2.approx_pi4} 1 0.000 0.000 0.014 0.014 <string>:1(<module>) 1 0.000 0.000 0.014 0.014 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
And with just a simple change, by using i*i
instead of i**2
:
%%cython -3
# cython: profile=True
from __future__ import division
import cython
@cython.profile(False)
cdef inline double recip_square5(int i):
return 1./(i*i)
def approx_pi5(int n=10000000):
cdef double val = 0.
cdef int k
for k in range(1, n + 1):
val += recip_square5(k)
return (6 * val)**.5
%timeit approx_pi4()
13.9 ms ± 237 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%prun approx_pi4()
4 function calls in 0.014 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.014 0.014 0.014 0.014 {_cython_magic_9d338490eeecee00b47e73b849343da2.approx_pi4} 1 0.000 0.000 0.014 0.014 <string>:1(<module>) 1 0.000 0.000 0.014 0.014 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
That's it for today.