TASK 1
Write the code to enumerate items in the list:
Example:
Input
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']
Output
#something like:
[0, 1, 2, 0, 2, 1]
TASK 2
For each element in a list [0, 1, 2, ..., N]
build all possible pairs with other elements of that list.
Example:
Input:
[0, 1, 2, 3] or just 4
Output:
0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3
1, 2, 3, 0, 2, 3, 0, 1, 3, 0, 1, 2
http://docs.python-guide.org/en/latest/
Hard way is easier http://learnpythonthehardway.org
Google python class
https://developers.google.com/edu/python/
https://docs.python.org/2/tutorial/
Learning by doing!
code should be readable first!
style guides
looping through dictionaries http://docs.quantifiedcode.com/python-anti-patterns/performance/index.html
using wildcard imports (from ... import *) http://docs.quantifiedcode.com/python-anti-patterns/maintainability/from_module_import_all_used.html
Using single letter to name your variables http://docs.quantifiedcode.com/python-anti-patterns/maintainability/using_single_letter_as_variable_name.html
Comparing things to None the wrong way http://docs.quantifiedcode.com/python-anti-patterns/readability/comparison_to_none.html
Comparing things to True the wrong way http://docs.quantifiedcode.com/python-anti-patterns/readability/comparison_to_true.html
Using type() to compare types http://docs.quantifiedcode.com/python-anti-patterns/readability/do_not_compare_types_use_isinstance.html
Using an unpythonic loop http://docs.quantifiedcode.com/python-anti-patterns/readability/using_an_unpythonic_loop.html
Using CamelCase in function names http://docs.quantifiedcode.com/python-anti-patterns/readability/using_camelcase_in_function_names.html
Verify your python version by running
python --version
This notebook is written in pyhton 2.
a = b = 3
c, d = 4, 5
c, d = d, c
greeting = 'Hello'
guest = "John"
my_string = 'Hello "John"'
named_greeting = 'Hello, {name}'.format(name=guest)
named_greeting2 = '{}, {}'.format(greeting, guest)
print named_greeting
print named_greeting2
Hello, John Hello, John
for more details see docs: https://docs.python.org/2/tutorial/datastructures.html
fruit_list = ['apple', 'orange', 'peach', 'mango', 'bananas', 'pineapple']
name_length = [len(fruit) for fruit in fruit_list]
print name_length
[5, 6, 5, 5, 7, 9]
name_with_p = [fruit for fruit in fruit_list if fruit[0]=='p'] #even better: fruit.startswith('p')
numbered_fruits = []
for i, fruit in enumerate(fruit_list):
numbered_fruits.append('{}.{}'.format(i, fruit))
numbered_fruits
['0.apple', '1.orange', '2.peach', '3.mango', '4.bananas', '5.pineapple']
Indexing starts with zero.
General indexing rule (mind the brackets): [start:stop:step]
numbered_fruits[0] = None
numbered_fruits[1:4]
['1.orange', '2.peach', '3.mango']
numbered_fruits[1:-1:2]
['1.orange', '3.mango']
numbered_fruits[::-1]
['5.pineapple', '4.bananas', '3.mango', '2.peach', '1.orange', None]
immutable type!
p_fruits = (name_with_p[1], name_with_p[0])
p_fruits[1] = 'mango'
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-467-a967184828ef> in <module>() 1 p_fruits = (name_with_p[1], name_with_p[0]) ----> 2 p_fruits[1] = 'mango' TypeError: 'tuple' object does not support item assignment
single_number_tuple = 3,
single_number_tuple
(3,)
single_number_tuple + (2,) + (1, 0)
(3, 2, 1, 0)
Immutable type. Stores only unique elements.
set([0, 1, 2, 1, 1, 1, 3])
{0, 1, 2, 3}
fruit_list = ['apple', 'orange', 'mango', 'banana', 'pineapple']
quantities = [3, 5, 2, 3, 4]
order_fruits = {fruit: num \
for fruit, num in zip(fruit_list, quantities)}
order_fruits
{'apple': 3, 'banana': 3, 'mango': 2, 'orange': 5, 'pineapple': 4}
order_fruits['pineapple'] = 2
order_fruits
{'apple': 3, 'banana': 3, 'mango': 2, 'orange': 5, 'pineapple': 2}
print order_fruits.keys()
print order_fruits.values()
['orange', 'mango', 'pineapple', 'apple', 'banana'] [5, 2, 2, 3, 3]
for fruit, amount in order_fruits.iteritems():
print 'Buy {num} {entity}s'.format(num=amount, entity=fruit)
Buy 5 oranges Buy 2 mangos Buy 2 pineapples Buy 3 apples Buy 3 bananas
def my_func(var1, var2, default_var1=0, default_var2 = False):
"""
This is a generic example of python a function.
You can see this string when do call: my_func?
"""
#do something with vars
if not default_var2:
result = var1
elif default_var1 == 0:
result = var1
else:
result = var1 + var2
return result
function is just another object (like almost everything in python)
print 'Function {} has the following docstring:\n{}'\
.format(my_func.func_name, my_func.func_doc)
Function my_func has the following docstring: This is a generic example of python a function. You can see this string when do call: my_func?
Guidence on how to create meaningful docstring: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#docstring-standard
def function_over_function(func, *args, **kwargs):
function_result = func(*args, **kwargs)
return function_result
function_over_function(my_func, 3, 5, default_var1=1, default_var2=True)
8
function_over_function(lambda x, y, factor=10: (x+y)*factor, 1, 2, 5)
15
Don't assign lambda expressions to variables. If you need named instance - create standard function with def
my_simple_func = lambda x: x+1
vs
def my_simple_func(x):
return x + 1
import numpy as np
matrix_from_list = np.array([[1, 3, 4],
[2, 0, 5],
[4, 4, 1],
[0, 1, 0]])
vector_from_list = np.array([2, 1, 3])
print 'The matrix is\n{matrix}\n\nthe vector is\n{vector}'\
.format(vector=vector_from_list, matrix=matrix_from_list)
The matrix is [[1 3 4] [2 0 5] [4 4 1] [0 1 0]] the vector is [2 1 3]
matrix_from_list.dot(vector_from_list)
array([17, 19, 15, 1])
matrix_from_list + vector_from_list
array([[3, 4, 7], [4, 1, 8], [6, 5, 4], [2, 2, 3]])
single_precision_vector = np.array([1, 3, 5, 2], dtype=np.float32)
single_precision_vector.dtype
dtype('float32')
vector_from_list.dtype
dtype('int32')
vector_from_list.astype(np.int16)
array([2, 1, 3], dtype=int16)
mind dimensionality!
row_vector = np.array([[1,2,3]])
print 'New vector {} has dimensionality {}'\
.format(row_vector, row_vector.shape)
print 'The dot-product is: ', matrix_from_list.dot(row_vector)
New vector [[1 2 3]] has dimensionality (1L, 3L) The dot-product is:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-550-286bbd2b9667> in <module>() 3 print 'New vector {} has dimensionality {}' .format(row_vector, row_vector.shape) 4 ----> 5 print 'The dot-product is: ', matrix_from_list.dot(row_vector) ValueError: shapes (4,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
singleton_vector = row_vector.squeeze()
print 'Squeezed vector {} has shape {}'.format(singleton_vector, singleton_vector.shape)
Squeezed vector [1 2 3] has shape (3L,)
matrix_from_list.dot(singleton_vector)
array([19, 17, 15, 2])
print singleton_vector[:, np.newaxis]
[[1] [2] [3]]
mat = np.arange(12)
mat.reshape(-1, 4)
mat
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
print singleton_vector[:, None]
[[1] [2] [3]]
vector12 = np.arange(12)
vector12
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Guess what is the output:
vector12[:3]
vector12[-1]
vector12[:-2]
vector12[3:7]
vector12[::2]
vector12[::-1]
matrix43 = vector12.reshape(4, 3)
matrix43
array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]])
Guess what is the output:
matrix43[:, 0]
matrix43[-1, :]
matrix43[::2, :]
matrix43[:3, :-1]
matrix43[3:, 1]
Unlike Matlab, numpy arrays are column-major (or C-major) by default, not row-major (or F-major).
Working with views is more efficient and is a preferred way.
view is returned whenever basic slicing is used
more details at http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
making copy is simple:
matrix43_copy = matrix43[:]
matrix_to_reshape = np.random.randint(10, 99, size=(6, 4))
matrix_to_reshape
array([[34, 93, 79, 92], [39, 80, 92, 78], [91, 67, 78, 73], [90, 78, 51, 66], [86, 29, 60, 30], [88, 58, 10, 35]])
reshaped_matrix = matrix_to_reshape.reshape(8, 3)
reshaped_matrix
array([[34, 93, 79], [92, 39, 80], [92, 78, 91], [67, 78, 73], [90, 78, 51], [66, 86, 29], [60, 30, 88], [58, 10, 35]])
reshape always returns view!
reshaped_matrix[-1, 0] = 1
np.set_printoptions(formatter={'all':lambda x: '_{}_'.format(x) if x < 10 else str(x)})
matrix_to_reshape[:]
array([[34, 93, 79, 92], [39, 80, 92, 78], [91, 67, 78, 73], [90, 78, 51, 66], [86, 29, 60, 30], [88, _1_, 10, 35]])
np.set_printoptions()
idx = matrix43 > 4
matrix43[idx]
array([ 5, 6, 7, 8, 9, 10, 11])
eye, ones, zeros, diag
Example: Build three-diagonal matrix with -2's on main diagonal and 1's and subdiagonals
Is this code valid?
def three_diagonal(N):
A = np.zeros((N, N), dtype=np.int)
for i in range(N):
A[i, i] = -2
if i > 0:
A[i, i-1] = 1
if i < N-1:
A[i, i+1] = 1
return A
print three_diagonal(5)
[[-2 1 0 0 0] [ 1 -2 1 0 0] [ 0 1 -2 1 0] [ 0 0 1 -2 1] [ 0 0 0 1 -2]]
def numpy_three_diagonal(N):
main_diagonal = -2 * np.eye(N)
suddiag_value = np.ones(N-1,)
lower_subdiag = np.diag(suddiag_value, k=-1)
upper_subdiag = np.diag(suddiag_value, k=1)
result = main_diagonal + lower_subdiag + upper_subdiag
return result.astype(np.int)
numpy_three_diagonal(5)
array([[-2, 1, 0, 0, 0], [ 1, -2, 1, 0, 0], [ 0, 1, -2, 1, 0], [ 0, 0, 1, -2, 1], [ 0, 0, 0, 1, -2]])
A = numpy_three_diagonal(5)
A[0, -1] = 5
A[-1, 0] = 3
print A
print A.sum()
print A.min()
print A.max(axis=0)
print A.sum(axis=0)
print A.mean(axis=1)
print (A > 4).any(axis=1)
[[-2 1 0 0 5] [ 1 -2 1 0 0] [ 0 1 -2 1 0] [ 0 0 1 -2 1] [ 3 0 0 1 -2]] 6 -2 [3 1 1 1 5] [2 0 0 0 4] [ 0.8 0. 0. 0. 0.4] [ True False False False False]
print np.pi
3.14159265359
args = np.arange(0, 2.5*np.pi, 0.5*np.pi)
print np.sin(args)
[ 0.00000000e+00 1.00000000e+00 1.22464680e-16 -1.00000000e+00 -2.44929360e-16]
print np.round(np.sin(args), decimals=2)
[ 0. 1. 0. -1. 0.]
'{}, {:.1%}, {:e}, {:.2f}, {:.0f}'.format(*np.sin(args))
'0.0, 100.0%, 1.224647e-16, -1.00, -0'
np.set_printoptions(formatter={'all':lambda x: '{:.2f}'.format(x)})
print np.sin(args)
np.set_printoptions()
[0.00 1.00 0.00 -1.00 -0.00]
linspace, meshgrid
Let's produce a function $$ f(x, y) = sin(x+y) $$ on some mesh.
linear_index = np.linspace(0, np.pi, 10, endpoint=True)
mesh_x, mesh_y = np.meshgrid(linear_index, linear_index)
values_3D = np.sin(mesh_x + mesh_y)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
%matplotlib inline
fig = plt.figure(figsize=(10,6))
ax = fig.gca(projection='3d')
ax.plot_wireframe(mesh_x, mesh_y, values_3D)
ax.view_init(azim=-45, elev=30)
plt.title('The plot of $f(x, y) = sin(x+y)$')
<matplotlib.text.Text at 0x4fc264e0>
import scipy.sparse as sp
def scipy_three_diagonal(N):
main_diagonal = -2 * np.ones(N, )
suddiag_values = np.ones(N-1,)
diagonals = [main_diagonal, suddiag_values, suddiag_values]
# Another option: use sp.eye(N) and add subdiagonals
offsets = [0, 1, -1]
result = sp.diags(diagonals, offsets, shape=(N, N), format='coo')
return result
my_sparse_matrix = scipy_three_diagonal(5)
my_sparse_matrix
<5x5 sparse matrix of type '<type 'numpy.float64'>' with 13 stored elements in COOrdinate format>
Sparse matrix stores only non-zero elements (and their indices)
print my_sparse_matrix
(0, 0) -2.0 (1, 1) -2.0 (2, 2) -2.0 (3, 3) -2.0 (4, 4) -2.0 (0, 1) 1.0 (1, 2) 1.0 (2, 3) 1.0 (3, 4) 1.0 (1, 0) 1.0 (2, 1) 1.0 (3, 2) 1.0 (4, 3) 1.0
my_sparse_matrix.toarray()
array([[-2., 1., 0., 0., 0.], [ 1., -2., 1., 0., 0.], [ 0., 1., -2., 1., 0.], [ 0., 0., 1., -2., 1.], [ 0., 0., 0., 1., -2.]])
my_sparse_matrix.A
array([[-2., 1., 0., 0., 0.], [ 1., -2., 1., 0., 0.], [ 0., 1., -2., 1., 0.], [ 0., 0., 1., -2., 1.], [ 0., 0., 0., 1., -2.]])
from scipy.linalg import toeplitz, hankel
hankel(xrange(4), [-1, -2, -3, -4])
array([[ 0, 1, 2, 3], [ 1, 2, 3, -2], [ 2, 3, -2, -3], [ 3, -2, -3, -4]])
toeplitz(xrange(4))
array([[0, 1, 2, 3], [1, 0, 1, 2], [2, 1, 0, 1], [3, 2, 1, 0]])
N = 1000
%timeit three_diagonal(N)
%timeit numpy_three_diagonal(N)
%timeit scipy_three_diagonal(N)
1000 loops, best of 3: 1.53 ms per loop 10 loops, best of 3: 20.6 ms per loop 1000 loops, best of 3: 272 µs per loop
You can also use %%timeit
magic to measure run time of the whole cell
%%timeit
N = 1000
calc = three_diagonal(N)
calc = scipy_three_diagonal(N)
del calc
100 loops, best of 3: 2.17 ms per loop
Avoid using time.time()
or time.clock()
directly as their behaviour's different depending on platform; default_timer
makes the best choice for you. It measures wall time though, e.g. not very precise.
from timeit import default_timer as timer
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench:[] for bench in bench_names}
for n in dims:
start_time = timer()
calc = three_diagonal(n)
time_delta = timer() - start_time
timings['loop'].append(time_delta)
start_time = timer()
calc = numpy_three_diagonal(n)
time_delta = timer() - start_time
timings['numpy'].append(time_delta)
start_time = timer()
calc = scipy_three_diagonal(n)
time_delta = timer() - start_time
timings['scipy'].append(time_delta)
Let's make the code less redundant
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
timings = {bench_name: [] for bench_name in bench_names}
def timing_machine(func, *args, **kwargs):
start_time = timer()
result = func(*args, **kwargs)
time_delta = timer() - start_time
return time_delta
for n in dims:
timings['loop'].append(timing_machine(three_diagonal, n))
timings['numpy'].append(timing_machine(numpy_three_diagonal, n))
timings['scipy'].append(timing_machine(scipy_three_diagonal, n))
timeit
with -o parameter¶more details on different parameters: https://ipython.org/ipython-doc/dev/interactive/magics.html#magic-timeit
timeit_result = %timeit -q -r 5 -o three_diagonal(10)
print 'Best of {} runs: {:.8f}s'.format(timeit_result.repeat,
timeit_result.best)
Best of 5 runs: 0.00000565s
Our new benchmark procedure
dims = [300, 1000, 3000, 10000]
bench_names = ['loop', 'numpy', 'scipy']
bench_funcs = [three_diagonal, numpy_three_diagonal, scipy_three_diagonal]
timings_best = {bench_name: [] for bench_name in bench_names}
for bench_name, bench_func in zip(bench_names, bench_funcs):
print '\nMeasuring {}'.format(bench_func.func_name)
for n in dims:
print n,
time_result = %timeit -q -o bench_func(n)
timings_best[bench_name].append(time_result.best)
Measuring three_diagonal 300 1000 3000 10000 Measuring numpy_three_diagonal 300 1000 3000 10000 Measuring scipy_three_diagonal 300 1000 3000 10000
import matplotlib.pyplot as plt
%matplotlib inline
%matplotlib inline
ensures all graphs are plotted inside your notebook
# plt.rcParams.update({'axes.labelsize': 'large'})
plt.rcParams.update({'font.size': 14})
plt.figure(figsize=(10,8))
for bench_name, values in timings_best.iteritems():
plt.semilogy(dims, values, label=bench_name)
plt.legend(loc='best')
plt.title('Benchmarking results with best of timeit', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
<matplotlib.text.Text at 0x4fc49cc0>
plt.figure(figsize=(10,8))
for bench_name, values in timings.iteritems():
plt.semilogy(dims, values, label=bench_name)
plt.legend(loc='best')
plt.title('Benchmarking results with default_timer', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
<matplotlib.text.Text at 0x375a2630>
Think, why:
You might want to read the docs:
Remark: starting from python 3.3 it's recommended to use time.perf_counter()
and time.process_time()
https://docs.python.org/3/library/time.html#time.perf_counter
Also note, that for advanced benchmarking it's better to use profiling tools.
Use plt.plot?
to get detailed info on function usage.
Task: given lists of x-values, y-falues and plot format strings, plot all three graphs in one line.
Hint: use list comprehensions
k = len(timings_best)
iter_xyf = [item for sublist in zip([dims]*k,
timings_best.values(),
list('rgb'))\
for item in sublist]
plt.figure(figsize=(10, 8))
plt.semilogy(*iter_xyf)
plt.legend(timings_best.keys(), loc=2, frameon=False)
plt.title('Benchmarking results - "one-liner"', y=1.03)
plt.xlabel('Matrix dimension size')
plt.ylabel('Time, s')
<matplotlib.text.Text at 0x2859bfd0>
Even simpler way - also gives you granular control on plot objects
plt.figure(figsize=(10, 8))
figs = [plt.semilogy(dims, values, label=bench_name)\
for bench_name, values in timings.iteritems()];
ax0, = figs[0]
ax0.set_dashes([5, 10, 20, 10, 5, 10])
ax1, = figs[1]
ax1.set_marker('s')
ax1.set_markerfacecolor('r')
ax2, = figs[2]
ax2.set_linewidth(6)
ax2.set_alpha(0.3)
ax2.set_color('m')
matplotlib has a number of different options for styling your plot
all_markers = [
'.', # point
',', # pixel
'o', # circle
'v', # triangle down
'^', # triangle up
'<', # triangle_left
'>', # triangle_right
'1', # tri_down
'2', # tri_up
'3', # tri_left
'4', # tri_right
'8', # octagon
's', # square
'p', # pentagon
'*', # star
'h', # hexagon1
'H', # hexagon2
'+', # plus
'x', # x
'D', # diamond
'd', # thin_diamond
'|', # vline
]
all_linestyles = [
'-', # solid line style
'--', # dashed line style
'-.', # dash-dot line style
':', # dotted line style
'None'# no line
]
all_colors = [
'b', # blue
'g', # green
'r', # red
'c', # cyan
'm', # magenta
'y', # yellow
'k', # black
'w', # white
]
for advanced usage of subplots start here
n = len(timings)
experiment_names = timings.keys()
fig, axes = plt.subplots(1, n, sharey=True, figsize=(16,4))
colors = np.random.choice(list('rgbcmyk'), n, replace=False)
markers = np.random.choice(all_markers, n, replace=False)
lines = np.random.choice(all_linestyles, n, replace=False)
for ax_num, ax in enumerate(axes):
key = experiment_names[ax_num]
ax.semilogy(dims, timings[key], label=key,
color=colors[ax_num],
marker=markers[ax_num],
markersize=8,
linestyle=lines[ax_num],
lw=3)
ax.set_xlabel('matrix dimension')
ax.set_title(key)
axes[0].set_ylabel('Time, s')
plt.suptitle('Benchmarking results', fontsize=16, y=1.03)
<matplotlib.text.Text at 0x581efb38>
plt.figure()
plt.subplot(211)
plt.plot([1,2,3])
plt.subplot(212)
plt.plot([2,5,4])
[<matplotlib.lines.Line2D at 0x2ef6ac88>]
Task: create subplot with 2 columns and 2 rows. Leave bottom left quarter empty. Scipy and numpy benchmarks should go into top row.
function wrappers and decorators
installing packages
importing modules
ipyton magic
qtconsole
environment
extensions
profiles (deprecated in jupyter)
profiling
debugging
cython, numba
openmp
OOP
python 2 vs python 3
plotting in python - palletes and colormaps, styles
pandas (presenting results)
numpy strides, contiguousness, vectorize function, broadcasting, saving output
magic functions (applied to line and to code cell)
jupyter configuration
Task 1
items = ['foo', 'bar', 'baz', 'foo', 'baz', 'bar']
method 1
from collections import defaultdict
item_ids = defaultdict(lambda: len(item_ids))
map(item_ids.__getitem__, items)
[0, 1, 2, 0, 2, 1]
method 2
import pandas as pd
pd.DataFrame({'items': items}).groupby('items', sort=False).grouper.group_info[0]
array([0, 1, 2, 0, 2, 1], dtype=int64)
method 3
import numpy as np
np.unique(items, return_inverse=True)[1]
array([2, 0, 1, 2, 1, 0])
method 4
last = 0
counts = {}
result = []
for item in items:
try:
count = counts[item]
except KeyError:
counts[item] = count = last
last += 1
result.append(count)
result
[0, 1, 2, 0, 2, 1]
Task 2
N = 1000
from itertools import permutations
%timeit list(permutations(xrange(N), 2))
10 loops, best of 3: 78.6 ms per loop
Hankel matrix: $a_{ij} = a_{i-1, j+1}$
import numpy as np
from scipy.linalg import hankel
def pairs_idx(n):
return np.vstack((np.repeat(xrange(n), n-1), hankel(xrange(1, n), xrange(-1, n-1)).ravel()))
%timeit pairs_idx(N)
100 loops, best of 3: 17.6 ms per loop