This notebook shows off the canonical higher-order functions.
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
def square(x):
return x ** 2
def iseven(n):
return n % 2 == 0
def add(x, y):
return x + y
def mul(x, y):
return x * y
def lesser(x, y):
if x < y:
return x
else:
return y
def greater(x, y):
if x > y:
return x
else:
return y
# map works like this
map(square, data)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
# In this way it's like numpy's broadcasting operators
import numpy as np
X = np.arange(1, 11)
X**2
array([ 1, 4, 9, 16, 25, 36, 49, 64, 81, 100])
But map
is pure Python and so
def fib(i):
if i in (0, 1):
return i
else:
return fib(i - 1) + fib(i - 2)
map(fib, data)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
# Normally we would perform this in the following way
result = []
for item in data:
result.append(fib(item))
result
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
# looking at the function above gives us a good pattern for how to define `map`
# We just abstract out the function `fib` for a user input
# `map` is easy to define
def map(fn, sequence):
result = []
for item in sequence:
result.append(fn(item))
return result
Little known fact, object methods are perfectly valid functions
map(str.upper, ['Alice', 'Bob', 'Charlie'])
['ALICE', 'BOB', 'CHARLIE']
The map
function is so important that it was given its own syntax, the list comprehension
[fib(i) for i in data]
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
[name.upper() for name in ['Alice', 'Bob', 'Charlie']]
['ALICE', 'BOB', 'CHARLIE']
The filter
higher order function filters a dataset by a predicate.
A predicate is a function that returns True
or False
. The filter
function returns a new list of only those elements for which the predicate holds true.
filter(iseven, data)
[2, 4, 6, 8, 10]
from sympy import isprime # Only works if you have the sympy math library installed
filter(isprime, data)
[2, 3, 5, 7]
def filter(predicate, sequence):
result = []
for item in sequence:
if predicate(item):
result.append(item)
return result
Reduce is the little sibling of map
and filter
. Reduce is much less popular and often scolded as being difficult to understand.
Despite its social problems reduce
is quite powerful and, once you write reduce
once you'll understand how it works. More importantly you will learn how to identify reduction operations and how to pair them with binary operators. Reductions are common in data analytics, particularly when reducing large datasets into synopses.
To show reduce
we'll first implement two common reductions, sum
and min
. We've written them suggestively with the binary operators add
and lesser
to highlight their similar structure. Pick out the parts of the following two functions that differ from each other.
def sum(sequence):
result = 0
for item in sequence:
# reult = result + item
result = add(result, item)
return result
def min(sequence):
result = 99999999999999 # a really big number
for item in sequence:
# result = result if result < item else item
result = lesser(result, item)
return result
Now fill in the blanks below to complete the definition of product
, a function that multiplies the elements of the sequence together.
def product(sequence):
result = ?
for item in sequence:
result = ?(result, item)
return result
assert product([2, 3, 10]) == 60
File "<ipython-input-16-92db2dd2fc1e>", line 2 result = ? ^ SyntaxError: invalid syntax
Write reduce
.
Start by copying the pattern of the above three functions. The parts that differ between the three are your inputs. Traditionally the arguments of reduce are ordered so that the examples below operate well.
def reduce(...):
...
File "<ipython-input-17-880dfea8dc75>", line 1 def reduce(...): ^ SyntaxError: invalid syntax
reduce(add, data, 0)
55
reduce(mul, data, 1)
3628800
reduce(lesser, data, 10000000)
1
reduce(greater, data, -100000000)
10
We started this notebook with lots of little definitions like
def add(x, y):
return x + y
These one-line functions sometimes seem a little silly. We use the lambda
keyword to create small functions on the fly. The above definition could be expressed as follows
add = lambda x, y: x + y
The expression lambda x, y: x + y
is a value, just like 3
or Alice
. Just like literal ints and strings, Lambda expressions can be used on-the-fly without storing them in variables.
reduce(add, data, 0)
55
reduce(lambda x, y: x + y, data, 0) # Define `add` on the fly
55
Additionally we can use lambda
to quickly specify functions as specializations of more general ones. In the following we quickly define sum, min, and max
sum = lambda data: reduce(add, data, 0)
min = lambda data: reduce(lesser, data, 99999999999)
max = lambda data: reduce(greater, data, -999999999999)
sum(data)
55
As an exercise make product
using lambda
, reduce
, and mul
.
product = ...
assert product([2, 3, 10]) == 60
File "<ipython-input-26-405b2d336b95>", line 1 product = ... ^ SyntaxError: invalid syntax
Groupby can be seen as a more powerful version of filter
. Rather than give you one subset of the data it divides the data into all relevant subsets.
filter(iseven, data)
[2, 4, 6, 8, 10]
from toolz import groupby
groupby(iseven, data)
{False: [1, 3, 5, 7, 9], True: [2, 4, 6, 8, 10]}
groupby(isprime, data)
{False: [1, 4, 6, 8, 9, 10], True: [2, 3, 5, 7]}
But groupby
is not restricted to predicates (functions which return True
or False
)
groupby(lambda n: n % 3, data)
{0: [3, 6, 9], 1: [1, 4, 7, 10], 2: [2, 5, 8]}
groupby(len, ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank'])
{3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}
Amazingly groupby
is not significantly more costly than filter
in the common case. It computes these groups in a single pass through the data.
Lets bring it all together on a tiny dataset
likes = """Alice likes Chocolate
Bob likes Chocolate
Bob likes Apples
Charlie likes Apples
Alice likes Peanut Butter
Charlie likes Peanut Butter"""
tuples = map(lambda s: s.split(' likes '), likes.split('\n'))
tuples
[['Alice', 'Chocolate'], ['Bob', 'Chocolate'], ['Bob', 'Apples'], ['Charlie', 'Apples'], ['Alice', 'Peanut Butter'], ['Charlie', 'Peanut Butter']]
groups = groupby(lambda x: x[0], tuples)
groups
{'Alice': [['Alice', 'Chocolate'], ['Alice', 'Peanut Butter']], 'Bob': [['Bob', 'Chocolate'], ['Bob', 'Apples']], 'Charlie': [['Charlie', 'Apples'], ['Charlie', 'Peanut Butter']]}
from toolz import valmap, first, second
valmap(lambda L: map(second, L), groups)
{'Alice': ['Chocolate', 'Peanut Butter'], 'Bob': ['Chocolate', 'Apples'], 'Charlie': ['Apples', 'Peanut Butter']}
valmap(lambda L: map(first, L), groupby(lambda x: x[1], tuples))
{'Apples': ['Bob', 'Charlie'], 'Chocolate': ['Alice', 'Bob'], 'Peanut Butter': ['Alice', 'Charlie']}
from toolz.curried import map, valmap, groupby, first, second, get, curry, compose, pipe
tmap = curry(compose(tuple, map))
pipe(tuples, groupby(second), valmap(tmap(first)))
valmap(tmap(first), groupby(second, tuples))
f = compose(valmap(tmap(first)), groupby(second))
f(tuples)
{'Apples': ('Bob', 'Charlie'), 'Chocolate': ('Alice', 'Bob'), 'Peanut Butter': ('Alice', 'Charlie')}