๐ข: This document was used during early development of siuba. See the siuba intro doc.
from siuba.siu import _, explain
f = _['a'] + _['b']
(_ + _)(1)
d = {'a': 1, 'b': 2}
explain(_.somecol.min())
(_['a'] + _['b'])(d)
f = _['a'] + 4
f(d)
_.somecol.min()
5
_()
represents a call, rather than executing one, it is called symbolic call_ + _
_()
_['a']
~~
. e.g. ~~_.func
Rational: It is much less common for people to make a call after a binary operation.
For example,
(_.a + _.b)()
(_.a + _.b).sum()
data = ['a','b','c']
# Binary operation
list(map(_ * 2, data))
['aa', 'bb', 'cc']
# Method call
list(map(_.upper(), data))
['A', 'B', 'C']
# Index
get_ax = _['a']['x']
get_ax({'a': {'x': 1}, 'b': 2})
1
# Escaping
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
points = [Point(x = 0, y = 1), Point(x = 1, y = 2)]
# doesn't work, since _.x() is a symbolic call, like _.upper()
#list(map(_.x, points))
# works via escaping
list(map(~~_.x, points))
[0, 1]
# needs no escaping, since binary op!
list(map(_.x + _.y, points))
[1, 3]
# contrived complex example of escaping
# access .imag attribute of x + y
list(map(~~(_.x + _.y).imag, points))
[0, 0]
Ready to call:
Not ready to call:
Lambdas lock your code away. You know that when you call it, it will do some work, but you don't know what that is. Siu expressions can state what they want to do.
f = _.a + _.b / 2 + _.c**_.d >> _ & _
explain(f)
((_.a + (_.b / 2) + _.c**_.d) >> _) & _
By default, siu expressions are represented via a call tree...
(_.a + _.b) / 2
โโ/ โโโโ+ โ โโโโ. โ โ โโ_ โ โ โโ'a' โ โโโโ. โ โโ_ โ โโ'b' โโ2
While still rough, we can do analyses on siu expressions
symbol = _.a[_.b + 1] + _['c']
# hacky way to go from symbol to call for now
call = symbol.source
call.op_vars()
{'a', 'b', 'c'}
Down the road, we can use siu's transparency in execution engines.
People can say what they want to do, and we can optimize how to do it (e.g. in pandas, sql, etc..).
One kind of crazy thing I did was create a metahook, that automatically turns an imported function into one that creates siu expressions... (feature is currently unused!)
import siuba.meta_hook
from siuba.meta_hook.operator import add, sub
from siuba.meta_hook.pandas import DataFrame
f = add(1, _['a'] + _['b'])
explain(f)
f({'a': 1, 'b': 2})
<built-in function add>(1,_('a') + _('b'))
4
DataFrame({'a': [1,2,3]})
โโ'__call__' โโ<class 'pandas.core.frame.DataFrame'> โโ{'a': [1, 2, 3]}
_.a + _.b
โโ+ โโโโ. โ โโ_ โ โโ'a' โโโโ. โโ_ โโ'b'
_.a() + _.b
โโ+ โโโโ'__call__' โ โโโโ. โ โโ_ โ โโ'a' โโโโ. โโ_ โโ'b'
It depends how many times you call it. For many applications you only need to call an expression once (e.g. in pandas). If you call it many times, like in the example below, then it will be slower than using a lambda.
However, for libraries that expect siu expressions, knowing what they want to do means that we can actually speed up operations.
Below I just show the downside, that out of the box they're slower than lambdas :/
def lmap(*args, **kwargs): return list(map(*args, **kwargs))
l = [dict(a = 1) for ii in range(10*6)]
%%timeit
# NBVAL_IGNORE_OUTPUT
x = lmap(_['a'], l)
212 ยตs ยฑ 50.2 ยตs per loop (mean ยฑ std. dev. of 7 runs, 1000 loops each)
%%timeit
# NBVAL_IGNORE_OUTPUT
x = lmap(lambda x: x['a'], l)
7.29 ยตs ยฑ 199 ns per loop (mean ยฑ std. dev. of 7 runs, 100000 loops each)
When siu expressions contain list literals, they can't know about any expressions inside those lists. E.g.
f = _ + [_, _, _]
f(['a'])
['a', _, _, _]
This can easily be worked around, though!
from siuba.meta_hook import lazy_func
@lazy_func
def List(*args):
return list(args)
f = _ + List(_, _, _)
f(['a'])
['a', ['a'], ['a'], ['a']]