The target audience is intermediate Python users looking for a deeper understanding of the language. It attempts to correct some common misperceptions of how Python works. While similar to many other programming languages, Python is quite different from some in subtle and important ways.
Almost all of the material in the video is presented in the interactive Python prompt (aka the Read Eval Print Loop or REPL). I'll be using an IPython notebook but you can use Python without IPython just fine.
I'm using Python 3.4 and I suggest you do the same unless you're familiar with the differences between Python versions 2 and 3 and prefer to use Python 2.x.
There are some intentional code errors in both the regular presentation material and the exercises. The purpose of the intentional errors is to foster learning from how things fail.
Let's go back to square one and be sure we understand the basics about objects in Python.
1
3.14
3.14j
'a string literal'
b'a bytes literal'
(1, 2)
[1, 2]
{'one': 1, 'two': 2}
{'one', 'two'}
False, True
None
NotImplemented, Ellipsis
int, list
any, len
Everything (everything) in Python (at runtime) is an object.
Every object has:
Let's explore each of these in turn.
type(1)
type(3.14)
type(3.14j)
type('a string literal')
type(b'a bytes literal')
type((1, 2))
type([1, 2])
type({'one': 1, 'two': 2})
type({'one', 'two'})
type(True)
type(None)
True.__doc__
'a string literal'.__add__
callable('a string literal'.__add__)
'a string literal'.__add__('!')
True.__class__
True.__class__.__bases__
True.__class__.__bases__[0]
True.__class__.__bases__[0].__bases__[0]
The method resolution order for classes is stored in __mro__
by
the class's mro
method, which can be overridden.
bool.__mro__
import inspect
inspect.getmro(True)
inspect.getmro(type(True))
inspect.getmro(type(3))
inspect.getmro(type('a string literal'))
id(3)
id(3.14)
id('a string literal')
id(True)
len
callable(len)
len('a string literal')
'a string literal'.__len__
'a string literal'.__len__()
callable(int)
int(3.14)
int()
dict
dict()
dict(pi=3.14, e=2.71)
callable(True)
True()
bool()
5.0
dir(5.0)
5.0.__add__
callable(5.0.__add__)
5.0.__add__()
5.0.__add__(4)
4.__add__
(4).__add__
(4).__add__(5)
import sys
size = sys.getsizeof
print('Size of w is', size('w'), 'bytes.')
size('walk')
size(2)
size(2**30 - 1)
size(2**30)
size(2**60-1)
size(2**60)
size(2**1000)
Every object has (zero or) one or more names, in one or more namespaces.
Understanding names is foundational to understanding Python and using it effectively
Names refer to objects. Namespaces are like dictionaries.
dir()
IPython adds a lot of names to the global namespace! Let's workaround that.
%%writefile dirp.py
def _dir(obj='__secret', _CLUTTER=dir()):
"""
A version of dir that excludes clutter and private names.
"""
if obj == '__secret':
names = globals().keys()
else:
names = dir(obj)
return [n for n in names if n not in _CLUTTER and not n.startswith('_')]
def _dirn(_CLUTTER=dir()):
"""
Display the current global namespace, ignoring old names.
"""
return dict([
(n, v) for (n, v) in globals().items()
if not n in _CLUTTER and not n.startswith('_')])
%load dirp
_dirn()
a
a = 300
_dirn()
a
Python has variables in the mathematical sense - names that can vary, but not in the sense of boxes that hold values like you may be thinking about them. Imagine instead names or labels that you can add to an object or move to another object.
a = 400
Simple name assignment and re-assignment are not operations on objects, they are namespace operations!
_dirn()
b = a
b
a
_dirn()
id(a), id(b)
id(a) == id(b)
a is b
del a
_dirn()
a
The del
statement on a name is a namespace operation, i.e. it does
not delete the object. Python will delete objects when they have no
more names (when their reference count drops to zero).
Of course, given that the name b
is just a name for an object and it's
objects that have types, not names, there's no restriction on the
type of object that the name b
refers to.
b = 'walk'
b
id(b)
del b
_dirn()
Object attributes are also like dictionaries, and "in a sense the set of attributes of an object also form a namespace." (https://docs.python.org/3/tutorial/classes.html#python-scopes-and-namespaces)
class SimpleNamespace:
pass
SimpleNamespace
was added to the types
module in Python 3.3
import sys
if (sys.version_info.major, sys.version_info.minor) >= (3, 3):
from types import SimpleNamespace
p = SimpleNamespace()
p
p.__dict__
p.x, p.y = 1.0, 2.0
p.__dict__
p.x, p.y
i = 10
j = 10
i is j
i == j
i = 500
j = 500
i is j
i == j
Use ==
to check for equality. Only use is
if you want to check
identity, i.e. if two object references or names refer to the same
object.
The reason ==
and is
don't always match with int
as shown
above is that CPython pre-creates some frequently used int
objects
to increase performance. Which ones are documented in the source
code, or we can figure out which ones by looking at their id
s.
import itertools
for i in itertools.chain(range(-7, -3), range(254, 259)):
print(i, id(i))
dir()
_dir = dir
If dir()
returns too many names define and use _dir instead. Or
use dirp.py
from above. If you're running Python without the
IPython notebook plain old dir
should be fine.
def _dir(_CLUTTER=dir()):
"""
Display the current global namespace, ignoring old names.
"""
return [n for n in globals()
if n not in _CLUTTER and not n.startswith('_')]
v = 1
v
_dir()
type(v)
w = v
v is w
m = [1, 2, 3]
m
n = m
n
_dir()
m is n
m[1] = 'two'
m, n
int.__add__
int.__add__ = int.__sub__
from sys import getrefcount as refs
refs(None)
refs(object)
sentinel_value = object()
refs(sentinel_value)
Use object()
to create a unique object which is not equal to any other object, for example to use as a sentinel value.
sentinel_value == object()
sentinel_value == sentinel_value
refs(1)
refs(2)
refs(25)
[(i, refs(i)) for i in range(100)]
i, j = 1, 2
i, j
i, j = j, i
i, j
i, j, k = (1, 2, 3)
i, j, k = 1, 2, 3
i, j, k = [1, 2, 3]
i, j, k = 'ijk'
Extended iterable unpacking is only in Python 3:
i, j, k, *rest = 'ijklmnop'
i, j, k, rest
first, *middle, second_last, last = 'abcdefg'
first, middle, second_last, last
i, *middle, j = 'ij'
i, middle, j
Review:
A namespace is a mapping from valid identifier names to objects. Think of it as a dictionary.
Simple assignment (=
) and del
are namespace operations, not operations on objects.
Terminology and Definitions:
A scope is a section of Python code where a namespace is directly accessible.
For an indirectly accessible namespace you access values via dot
notation, e.g. p.x
or sys.version_info.major
.
The (direct) namespace search order is (from http://docs.python.org/3/tutorial):
The innermost scope contains local names
The namespaces of enclosing functions, searched starting with the nearest enclosing scope; (or the module if outside any function)
The middle scope contains the current module's global names
The outermost scope is the namespace containing built-in names
All namespace changes happen in the local scope (i.e. in the current scope in which the namespace-changing code executes):
=
i.e. assignmentdel
nameimport
namedef
nameclass
namedef foo
(name):
for
loop: for
name in ...
Exception as
name:
with open(filename) as
name:
__doc__
You should never reassign built-in names..., but let's do so to explore how name scopes work.
len
def f1():
def len():
len = range(3)
print("In f1's local len(), len is {}".format(len))
return len
print('In f1(), len = {}'.format(len))
result = len()
print('Returning result: {!r}'.format(result))
return result
f1()
def f2():
def len():
# len = range(3)
print("In f1's local len(), len is {}".format(len))
return len
print('In f1(), len = {}'.format(len))
result = len()
print('Returning result: {!r}'.format(result))
return result
f2()
len
len = 99
len
def print_len(s):
print('len(s) == {}'.format(len(s)))
print_len('walk')
len
del len
len
print_len('walk')
pass
pass = 3
Keywords at https://docs.python.org/3/reference/lexical_analysis.html#keywords
False class finally is return
None continue for lambda try
True def from nonlocal while
and del global not with
as elif if or yield
assert else import pass
break except in raise
Let's look at some surprising behaviour.
x = 1
def test_outer_scope():
print('In test_outer_scope x ==', x)
test_outer_scope()
def test_local():
x = 2
print('In test_local x ==', x)
x
test_local()
x
def test_unbound_local():
print('In test_unbound_local ==', x)
x = 3
x
test_unbound_local()
x
Let's introspect the function test_unbound_local
to help us understand this error.
test_unbound_local.__code__
test_unbound_local.__code__.co_argcount # count of positional args
test_unbound_local.__code__.co_name # function name
test_unbound_local.__code__.co_names # names used in bytecode
test_unbound_local.__code__.co_nlocals # number of locals
test_unbound_local.__code__.co_varnames # names of locals
import dis
dis.dis(test_unbound_local.__code__.co_code)
The use of x
by LOAD_FAST happens before it's set by STORE_FAST.
"This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope. Since the last statement in foo assigns a new value to x, the compiler recognizes it as a local variable. Consequently when the earlier print x attempts to print the uninitialized local variable and an error results." -- https://docs.python.org/3/faq/programming.html#why-am-i-getting-an-unboundlocalerror-when-the-variable-has-a-value
To explore this further on your own compare these two:
dis.dis(codeop.compile_command('def t1(): a = b; b = 7'))
dis.dis(codeop.compile_command('def t2(): b = 7; a = b'))
def test_global():
global x
print('In test_global before, x ==', x)
x = 4
print('In test_global after, x ==', x)
x
test_global()
x
test_global.__code__.co_varnames
def test_nonlocal():
x = 5
def test6():
nonlocal x
print('test6 before x ==', x)
x = 6
print('test6 after x ==', x)
print('test_nonlocal before x ==', x)
test6()
print('test_nonlocal after x ==', x)
x = 1
x
test_nonlocal()
x
Restart Python to unclutter the namespace.
%%javascript
IPython.notebook.kernel.restart();
[n for n in dir() if not n.startswith('_')]
There are lots of built-in names that dir()
doesn't show us.
Let's use some Python to explore all the builtin names by category.
import builtins, collections, inspect, textwrap
fill = textwrap.TextWrapper(width=60).fill
def pfill(pairs):
"""Sort and print first of every pair"""
print(fill(' '.join(list(zip(*sorted(pairs)))[0])))
Collect all members of builtins
:
members = set([
m for m in inspect.getmembers(builtins)
if not m[0].startswith('_')])
len(members)
Pull out just the exception
s:
exceptions = [
(name, obj) for (name, obj) in members
if inspect.isclass(obj) and
issubclass(obj, BaseException)]
members -= set(exceptions)
len(exceptions), len(members)
pfill(exceptions)
https://docs.python.org/3/library/exceptions.html#exception-hierarchy:
BaseException
+-- SystemExit
+-- KeyboardInterrupt
+-- GeneratorExit
+-- Exception
+-- StopIteration
+-- ArithmeticError
| +-- FloatingPointError
| +-- OverflowError
| +-- ZeroDivisionError
+-- AssertionError
+-- AttributeError
+-- BufferError
+-- EOFError
+-- ImportError
+-- LookupError
| +-- IndexError
| +-- KeyError
+-- MemoryError
+-- NameError
| +-- UnboundLocalError
+-- OSError
| +-- BlockingIOError
| +-- ChildProcessError
| +-- ConnectionError
| | +-- BrokenPipeError
| | +-- ConnectionAbortedError
| | +-- ConnectionRefusedError
| | +-- ConnectionResetError
| +-- FileExistsError
| +-- FileNotFoundError
| +-- InterruptedError
| +-- IsADirectoryError
| +-- NotADirectoryError
| +-- PermissionError
| +-- ProcessLookupError
| +-- TimeoutError
+-- ReferenceError
+-- RuntimeError
| +-- NotImplementedError
+-- SyntaxError
| +-- IndentationError
| +-- TabError
+-- SystemError
+-- TypeError
+-- ValueError
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
+-- Warning
+-- DeprecationWarning
+-- PendingDeprecationWarning
+-- RuntimeWarning
+-- SyntaxWarning
+-- UserWarning
+-- FutureWarning
+-- ImportWarning
+-- UnicodeWarning
+-- BytesWarning
+-- ResourceWarning
pfill(members)
Most are one of these two types:
type(int), type(len)
Print them:
bnames = collections.defaultdict(set)
for name, obj in members:
bnames[type(obj)].add((name, obj))
for typ in [type(int), type(len)]:
pairs = bnames.pop(typ)
print(typ)
pfill(pairs)
print()
The leftovers:
for typ, pairs in bnames.items():
print('{}: {}'.format(typ, ' '.join((n for (n, o) in pairs))))
[k for k in locals().keys() if not k.startswith('_')]
[k for k in globals().keys() if not k.startswith('_')]
In the REPL these are the same:
locals() == globals()
The following code is not recommended but it reminds us that namespaces are like dictionaries.
x = 0
x
locals()['x']
locals()['x'] = 1
locals()['x']
x
If you're tempted to use it, try this code which due to "fast locals" doesn't do what you might expect:
def f():
locals()['x'] = 5
print(x)
Remember, these change or modify a namespace:
=
) and del
globals()
and locals()
]import
def
class
for
, except
, with
, and docstrings.]Next we'll explore import
.
%load dirp
_dir()
import pprint
_dir()
pprint
_dir(pprint)
pprint.pformat
pprint.pprint
pprint.foo
pprint.foo = 'Python is dangerous'
pprint.foo
from pprint import pformat as pprint_pformat
_dir()
pprint.pformat is pprint_pformat
pprint
pprint.pformat
del pprint
import pprint as pprint_module
_dir()
pprint_module.pformat is pprint_pformat
math
dir(math)
del math
import math
Why doesn't import math
give a NameError
?
math
del math
What if we don't know the name of the module until run-time?
import importlib
importlib.import_module('math')
math_module = importlib.import_module('math')
math_module.pi
math
module_name = 'math'
import module_name
import 'math'
import math
import pprint
dir(pprint)
pprint.__doc__
pprint.__file__
pprint.__name__
from pprint import *
[n for n in dir() if not n.startswith('_')]
import importlib
help(importlib.reload)
importlib.reload(csv)
importlib.reload('csv')
import csv
importlib.reload('csv')
importlib.reload(csv)
import sys
sys.path
def f():
pass
f.__name__
f
f.__name__ = 'g'
g
f.__name__
f
f.__qualname__ # Only in Python >= 3.3
f.__qualname__ = 'g'
f
f.__dict__
f.foo = 'bar'
f.__dict__
def f(a, b, k1='k1', k2='k2',
*args, **kwargs):
print('a: {!r}, b: {!r}, '
'k1: {!r}, k2: {!r}'
.format(a, b, k1, k2))
print('args:', repr(args))
print('kwargs:', repr(kwargs))
f.__defaults__
f(1, 2)
f(a=1, b=2)
f(b=1, a=2)
f(1, 2, 3)
f(1, 2, k2=4)
f(1, k1=3) # Fails
f(1, 2, 3, 4, 5, 6)
f(1, 2, 3, 4, keya=7, keyb=8)
f(1, 2, 3, 4, 5, 6, keya=7, keyb=8)
f(1, 2, 3, 4, 5, 6, keya=7, keyb=8, 9)
def g(a, b, *args, c=None):
print('a: {!r}, b: {!r}, '
'args: {!r}, c: {!r}'
.format(a, b, args, c))
g.__defaults__
g.__kwdefaults__
g(1, 2, 3, 4)
g(1, 2, 3, 4, c=True)
Keyword-only arguments in Python 3, i.e. named parameters occurring
after *args
(or *
) in the parameter list must be specified using
keyword syntax in the call. This lets a function take a varying
number of arguments and also take options in the form of keyword
arguments.
def h(a=None, *args, keyword_only=None):
print('a: {!r}, args: {!r}, '
'keyword_only: {!r}'
.format(a, args, keyword_only))
h.__defaults__
h.__kwdefaults__
h(1)
h(1, 2)
h(1, 2, 3)
h(*range(15))
h(1, 2, 3, 4, keyword_only=True)
h(1, keyword_only=True)
h(keyword_only=True)
def h2(a=None, *, keyword_only=None):
print('a: {!r}, '
'keyword_only: {!r}'
.format(a, keyword_only))
h2()
h2(1)
h2(keyword_only=True)
h2(1, 2)
def f(*args, **kwargs):
print(repr(args), repr(kwargs))
f(1)
f(1, 2)
f(1, a=3, b=4)
def f2(k1, k2):
print('f2({}, {})'.format(k1, k2))
t = 1, 2
t
d = dict(k1=3, k2=4)
d
f2(*t)
f2(**d)
f2(*d)
list(d)
f(*t, **d)
m = 'one two'.split()
f(1, 2, *m)
father = 'Dad'
locals()['father']
'Hi {father}'.format(**locals()) # A convenient hack. Only for throwaway code.
def f2(a: 'x', b: 5, c: None, d:list) -> float:
pass
f2.__annotations__
type(f2.__annotations__)
Create two names for the str
object 123
, then from it create 1234
and reassign one of the names:
s1 = s2 = '123'
s1 is s2, s1, s2
s2 = s2 + '4'
s1 is s2, s1, s2
We can see this reassigns the second name so it refers to a new
object. This works similarly if we start with two names for one
list
object and then reassign one of the names.
m1 = m2 = [1, 2, 3]
m1 is m2, m1, m2
m2 = m2 + [4]
m1 is m2, m1, m2
If for the str
objects we instead use an augmented assignment
statement, specifically in-place add +=, we get the same
behaviour.
s1 = s2 = '123'
s2 += '4'
s1 is s2, s1, s2
However, for the list
objects the behaviour changes.
m1 = m2 = [1, 2, 3]
m2 += [4]
m1 is m2, m1, m2
The += in foo += 1 is not just syntactic sugar for foo = foo + 1. += and other augmented assignment statements have their own bytecodes and methods.
Let's look at the bytecode to confirm this. Notice BINARY_ADD
vs. INPLACE_ADD. Note the runtime types of the objects referred to
my s
and v
is irrelevant to the bytecode that gets produced.
import codeop, dis
dis.dis(codeop.compile_command("a = a + b"))
dis.dis(codeop.compile_command("a += b"))
m2 = [1, 2, 3]
m2
Notice that __iadd__
returns a value
m2.__iadd__([4])
and it also changed the list
m2
s2.__iadd__('4')
So what happened when INPLACE_ADD
ran against the str
object?
If INPLACE_ADD
doesn't find __iadd__
it instead calls __add__
and
reassigns s1
, i.e. it falls back to __add__
.
https://docs.python.org/3/reference/datamodel.html#object.__iadd__:
These methods are called to implement the augmented arithmetic assignments (+=, etc.). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods.
Here's similar behaviour with tuples, but a bit more surprising:
t1 = (7,)
t1
t1[0] += 1
t1[0] = t1[0] + 1
t1
t2 = ([7],)
t2
t2[0] += [8]
What value do we expect t2 to have?
t2
Let's simulate the steps to see why this behaviour makes sense.
m = [7]
t2 = (m,)
t2
temp = m.__iadd__([8])
temp == m
temp is m
temp
t2
t2[0] = temp
For a similar explanation see https://docs.python.org/3/faq/programming.html#faq-augmented-assignment-tuple-error
Can functions modify the arguments passed in to them?
When a caller passes an argument to a function, the function starts execution with a local name (the parameter from its signature) referring to the argument object passed in.
def test_1a(s):
print('Before:', s)
s += ' two'
print('After:', s)
s1 = 'one'
s1
test_1a(s1)
s1
To see more clearly why s1
is still a name for 'one', consider
this version which is functionally equivalent but has two changes
highlighted in the comments:
def test_1b(s):
print('Before:', s)
s = s + ' two' # Changed from +=
print('After:', s)
test_1b('one') # Changed from s1 to 'one'
In both cases the name s
at the beginning of test_1a
and
test_1b
was a name that referred to the str
object 'one'
,
and in both the function-local name s
was reassigned to refer to
the new str
object 'hello there'
.
Let's try this with a list
.
def test_2a(m):
print('Before:', m)
m += [4] # list += list is shorthand for list.extend(list)
print('After:', m)
m1 = [1, 2, 3]
m1
test_2a(m1)
m1
Conceptually a decorator changes or adds to the functionality of a function either by modifying its arguments before the function is called, or changing its return values afterwards, or both.
First let's look at a simple example of a function that returns another function.
def add(first, second):
return first + second
add(2, 3)
def create_adder(first):
def adder(second):
return add(first, second)
return adder
add_to_2 = create_adder(2)
add_to_2(3)
Next let's look at a function that receives a function as an argument.
def trace_function(f):
"""Add tracing before and after a function"""
def new_f(*args):
"""The new function"""
print(
'Called {}({!r})'
.format(f, *args))
result = f(*args)
print('Returning', result)
return result
return new_f
This trace_function
wraps the functionality of whatever existing
function is passed to it by returning a new function which calls the
original function, but prints some trace information before and
after.
traced_add = trace_function(add)
traced_add(2, 3)
We could instead reassign the original name.
add = trace_function(add)
add(2, 3)
Or we can use the decorator syntax to do that for us:
@trace_function
def add(first, second):
"""Return the sum of two arguments."""
return first + second
add(2, 3)
add
add.__qualname__
add.__doc__
Use @wraps
to update the metadata of the returned function and make it more useful.
import functools
def trace_function(f):
"""Add tracing before and after a function"""
@functools.wraps(f) # <-- Added
def new_f(*args):
"""The new function"""
print(
'Called {}({!r})'
.format(f, *args))
result = f(*args)
print('Returning', result)
return result
return new_f
@trace_function
def add(first, second):
"""Return the sum of two arguments."""
return first + second
add
add.__qualname__
add.__doc__
Here's another common example of the utility of decorators. Memoization is "an optimization technique... storing the results of expensive function calls and returning the cached result when the same inputs occur again." -- https://en.wikipedia.org/wiki/Memoization
def memoize(f):
print('Called memoize({!r})'.format(f))
cache = {}
@functools.wraps(f)
def memoized_f(*args):
print('Called memoized_f({!r})'.format(args))
if args in cache:
print('Cache hit!')
return cache[args]
if args not in cache:
result = f(*args)
cache[args] = result
return result
return memoized_f
@memoize
def add(first, second):
"""Return the sum of two arguments."""
return first + second
add(2, 3)
add(4, 5)
add(2, 3)
Note that this not a full treatment of decorators, only an introduction, and primarily from the perspective of how they intervene in the namespace operation of function definition. For example it leaves out entirely how to handle decorators that take more than one argument.
A decorator is a function that takes a function as an argument and typically returns a new function, but it can return anything. The following code misuses decorators to help you focus on their mechanics, which are really quite simple.
del x
def return_3(f):
print('Called return_3({!r})'.format(f))
return 3
def x():
pass
x
x = return_3(x)
What object will x
refer to now?
x
Here's equivalent code using @decorator
syntax:
@return_3
def x():
pass
x
type(x)
The class
statement starts a block of code and creates a new
namespace. All namespace changes in the block, e.g. assignment and
function definitions, are made in that new namespace. Finally it
adds the class name to the namespace where the class statement
appears.
Instances of a class are created by calling the class:
ClassName()
or ClassName(args)
.
ClassName.__init__(<new object>, ...)
is called automatically, and
is passed the instance of the class already created by a call to the
__new__
method.
Accessing an attribute method_name
on a class instance returns
a method object, if method_name
references a method (in
ClassName
or its superclasses). A method object binds the class
instance as the first argument to the method.
class Number: # In Python 2.x use "class Number(object):"
"""A number class."""
__version__ = '1.0'
def __init__(self, amount):
self.amount = amount
def add(self, value):
"""Add a value to the number."""
print('Call: add({!r}, {})'.format(self, value))
return self.amount + value
Number
Number.__version__
Number.__doc__
help(Number)
Number.__init__
Number.add
dir(Number)
def dir_public(obj):
return [n for n in dir(obj) if not n.startswith('__')]
dir_public(Number)
number2 = Number(2)
number2.amount
number2
number2.__init__
number2.add
dir_public(number2)
set(dir(number2)) ^ set(dir(Number)) # symmetric_difference
number2.__dict__
Number.__dict__
'add' in Number.__dict__
number2.add
number2.add(3)
Here's some unusual code ahead which will help us think carefully about how Python works.
Number.add
Will this work? Here's the gist of the method add
defined above:
def add(self, value):
return self.amount + value
Number.add(2)
Number.add(2, 3)
Number.add(number2, 3)
number2.add(3)
Remember, here's how __init__
was defined above:
def __init__(self, amount):
self.amount = amount
Number.__init__
help(Number.__init__)
Here's some code that's downright risky, but instructive. You should never need to do this in your code.
def set_double_amount(number, amount):
number.amount = 2 * amount
Number.__init__ = set_double_amount
Number.__init__
help(Number.__init__)
number4 = Number(2)
number4.amount
number4.add(5)
number4.__init__
number2.__init__
def multiply_by(number, value):
return number.amount * value
Let's add a mul
method. However, I will intentionally make a mistake.
number4.mul = multiply_by
number4.mul
number4.mul(5)
number4.mul(number4, 5)
Where's the mistake?
number10 = Number(5)
number10.mul
dir_public(number10)
dir_public(Number)
dir_public(number4)
Number.mul = multiply_by
number10.mul(5)
number4.mul(5)
dir_public(number4)
number4.__dict__
del number4.mul
number4.__dict__
dir_public(number4)
number4.mul
Number.mul
number4.mul(5)
Let's look behind the curtain to see how class instances work in Python.
Number
number4
Number.add
number4.add
Bound methods are handy.
add_to_4 = number4.add
add_to_4(6)
dir_public(number4)
dir(number4.add)
dir_public(number4.add)
set(dir(number4.add)) - set(dir(Number.add))
number4.add.__self__
number4.add.__self__ is number4
number4.add.__func__
number4.add.__func__ is Number.add
number4.add.__func__ is number10.add.__func__
number4.add(5)
So here's approximately how Python executes number4.add(5)
:
number4.add.__func__(number4.add.__self__, 5)
"The class statement is just a way to call a function, take the result, and put it into a namespace." -- Glyph Lefkowitz in Turtles All The Way Down: Demystifying Deferreds, Decorators, and Declarations at PyCon 2010
type(name, bases, dict)
is the default function that gets called
when Python read a class
statement.
print(type.__doc__)
Let's use the type function to build a class.
def init(self, amount):
self.amount = amount
def add(self, value):
"""Add a value to the number."""
print('Call: add({!r}, {})'.format(self, value))
return self.amount + value
Number = type(
'Number', (object,),
dict(__version__='1.0', __init__=init, add=add))
number3 = Number(3)
type(number3)
number3.__class__
number3.__dict__
number3.amount
number3.add(4)
Remember, here's the normal way to create a class:
class Number:
__version__='1.0'
def __init__(self, amount):
self.amount = amount
def add(self, value):
return self.amount + value
We can customize how classes get created.
https://docs.python.org/3/reference/datamodel.html#customizing-class-creation
By default, classes are constructed using type(). The class body is executed in a new namespace and the class name is bound locally to the result of type(name, bases, namespace).
The class creation process can be customised by passing the metaclass keyword argument in the class definition line, or by inheriting from an existing class that included such an argument.
The following makes explicit that the metaclass
, i.e. the
callable that Python should use to create a class, is the built-in
function type
.
class Number(metaclass=type):
def __init__(self, amount):
self.amount = amount
Test your understanding of the mechanics of class creation with some very unconventional uses of those mechanics.
What does the following code do? Note that return_5
ignores
arguments passed to it.
def return_5(name, bases, namespace):
print('Called return_5({!r})'.format((name, bases, namespace)))
return 5
return_5(None, None, None)
x = return_5(None, None, None)
x
type(x)
The syntax for specifying a metaclass changed in Python 3 so choose appropriately.
class y(object): # Python 2.x
__metaclass__ = return_5
class y(metaclass=return_5): # Python 3.x
pass
y
type(y)
We saw how decorators are applied to functions. They can also be applied to classes. What does the following code do?
def return_6(klass):
print('Called return_6({!r})'.format(klass))
return 6
return_6(None)
@return_6
class z:
pass
z
type(z)
This is not a robust decorator
def class_counter(klass):
"""Modify klass to count class instantiations"""
klass.count = 0
klass.__init_orig__ = klass.__init__
def new_init(self, *args, **kwargs):
klass.count += 1
klass.__init_orig__(self, *args, **kwargs)
klass.__init__ = new_init
return klass
@class_counter
class TC:
pass
TC.count
TC()
TC()
TC.count
Python implements operator overloading and many other features via special methods, the "dunder" methods that start and end with double underscores. Here's a very brief summary of them, more information at https://docs.python.org/3/reference/datamodel.html?highlight=co_nlocals#special-method-names.
basic class customization: __new__
, __init__
, __del__
,
__repr__
, __str__
, __bytes__
, __format__
rich comparison methods: __lt__
, __le__
, __eq__
, __ne__
,
__gt__
, __ge__
attribute access and descriptors: __getattr__
, __getattribute__
,
__setattr__
, __delattr__
, __dir__
, __get__
, __set__
,
__delete__
callables: __call__
container types: __len__
, __length_hint__
, __getitem__
,
__missing__
, __setitem__
, __delitem__
, __iter__
, (__next__
),
__reversed__
, __contains__
numeric types: __add__
, __sub__
, __mul__
, __truediv__
,
__floordiv__
, __mod__
, __divmod__
, __pow__
, __lshift__
,
__rshift__
, __and__
, __xor__
, __or__
reflected operands: __radd__
, __rsub__
, __rmul__
,
__rtruediv__
, __rfloordiv__
, __rmod__
, __rdivmod__
,
__rpow__
, __rlshift__
, __rrshift__
, __rand__
, __rxor__
,
__ror__
inplace operations: __iadd__
, __isub__
, __imul__
,
__trueidiv__
, __ifloordiv__
, __imod__
, __ipow__
,
__ilshift__
, __irshift__
, __iand__
, __ixor__
, __xor__
unary arithmetic: __neg__
, __pos__
, __abs__
, __invert__
implementing built-in functions: __complex__
, __int__
, __float__
, __round__
, __bool__
, __hash__
context managers: __enter__
, __exit__
Let's look at a simple example of changing how a class handles attribute access.
class UppercaseAttributes:
"""
A class that returns uppercase values on uppercase attribute access.
"""
# Called (if it exists) if an attribute access fails:
def __getattr__(self, name):
if name.isupper():
if name.lower() in self.__dict__:
return self.__dict__[
name.lower()].upper()
raise AttributeError(
"'{}' object has no attribute {}."
.format(self, name))
d = UppercaseAttributes()
d.__dict__
d.foo = 'bar'
d.foo
d.__dict__
d.FOO
d.baz
To add behaviour to specific attributes you can also use properties.
class PropertyEg:
"""@property example"""
def __init__(self):
self._x = 'Uninitialized'
@property
def x(self):
"""The 'x' property"""
print('called x getter()')
return self._x
@x.setter
def x(self, value):
print('called x.setter()')
self._x = value
@x.deleter
def x(self):
print('called x.deleter')
self.__init__()
p = PropertyEg()
p._x
p.x
p.x = 'bar'
p.x
del p.x
p.x
p.x = 'bar'
Usually you should just expose attributes and add properties later if you need some measure of control or change of behaviour.
Try the following:
class Get:
def __getitem__(self, key):
print('called __getitem__({} {})'
.format(type(key), repr(key)))
g = Get()
g[1]
g[-1]
g[0:3]
g[0:10:2]
g['Jan']
g[g]
m = list('abcdefghij')
m[0]
m[-1]
m[::2]
s = slice(3)
m[s]
m[slice(1, 3)]
m[slice(0, 2)]
m[slice(0, len(m), 2)]
m[::2]
A for
loop evaluates an expression to get an iterable and then
calls iter()
to get an iterator.
The iterator's __next__()
method is called repeatedly until
StopIteration
is raised.
for i in 'abc':
print(i)
iterator = iter('ab')
iterator.__next__()
iterator.__next__()
iterator.__next__()
iterator.__next__()
iterator = iter('ab')
next(iterator)
next(iterator)
next(iterator)
next()
just calls __next__()
, but you can pass it a second argument:
iterator = iter('ab')
next(iterator, 'z')
next(iterator, 'z')
next(iterator, 'z')
next(iterator, 'z')
iter(foo)
checks for foo.__iter__()
and calls it if it exists
else checks for foo.__getitem__()
and returns an object which
calls it starting at zero and handles IndexError
by raising
StopIteration
.
class MyList:
"""Demonstrate the iterator protocol"""
def __init__(self, sequence):
self.items = sequence
def __getitem__(self, key):
print('called __getitem__({})'
.format(key))
return self.items[key]
m = MyList('ab')
m.__getitem__(0)
m.__getitem__(1)
m.__getitem__(2)
m[0]
m[1]
m[2]
hasattr(m, '__iter__')
hasattr(m, '__getitem__')
iterator = iter(m)
next(iterator)
next(iterator)
next(iterator)
list(m)
for item in m:
print(item)
m = [1, 2, 3]
it = iter(m)
next(it)
next(it)
next(it)
next(it)
for n in m:
print(n)
d = {'one': 1, 'two': 2, 'three':3}
it = iter(d)
list(it)
m1 = [2 * i for i in range(3)]
m1
m2 = (2 * i for i in range(3))
m2
list(m2)
list(m2)
def list123():
print('Before first yield')
yield 1
print('Between first and second yield')
yield 2
print('Between second and third yield')
yield 3
print('After third yield')
list123
list123()
iterator = list123()
next(iterator)
next(iterator)
next(iterator)
next(iterator)
for i in list123():
print(i)
def even(limit):
for i in range(0, limit, 2):
print('Yielding', i)
yield i
print('done loop, falling out')
iterator = even(3)
iterator
next(iterator)
next(iterator)
for i in even(3):
print(i)
list(even(10))
Compare these versions
def even_1(limit):
for i in range(0, limit, 2):
yield i
def even_2(limit):
result = []
for i in range(0, limit, 2):
result.append(i)
return result
[i for i in even_1(10)]
[i for i in even_2(10)]
def paragraphs(lines):
result = ''
for line in lines:
if line.strip() == '':
yield result
result = ''
else:
result += line
yield result
%%writefile eg.txt
This is some sample
text. It has a couple
of paragraphs.
Each paragraph has at
least one sentence.
Most paragraphs have
two.
list(paragraphs(open('eg.txt')))
len(list(paragraphs(open('eg.txt'))))
Write a generator double(val, n=3)
that takes a value and returns
that value doubled n times. below are test cases to clarify.
%load solve_double # To display the solution in IPython
from solve_double import double
def test_double():
assert list(double('.')) == ['..', '....', '........']
assert list(double('s.', 2)) == ['s.s.', 's.s.s.s.']
assert list(double(1)) == [2, 4, 8]
test_double()
A few miscellaneous items:
months = ['jan', 'feb', 'mar', 'apr', 'may']
months[0:2]
months[0:100]
month_num_pairs = list(zip(months, range(1, 100)))
month_num_pairs
list(zip(*month_num_pairs))
{letter: num for letter, num in zip(months, range(1, 100))}
{letter.upper() for letter in 'mississipi'}
Python exposes many language features and places almost no constraints on what types data structures can hold.
Here's an example of using a dictionary of functions to create a
simple calculator. In some languages the only reasonable solution
would require a case
or switch
statement, or a series of if
statements. If you've been using such a language for a while, this
example may help you expand the range of solutions you can imagine in
Python.
Let's iteratively write code to get this behaviour:
assert calc('7+3') == 10
assert calc('9-5') == 4
assert calc('9/3') == 3
7+3
expr = '7+3'
lhs, op, rhs = expr
lhs, op, rhs
lhs, rhs = int(lhs), int(rhs)
lhs, op, rhs
op, lhs, rhs
def perform_operation(op, lhs, rhs):
if op == '+':
return lhs + rhs
if op == '-':
return lhs - rhs
if op == '/':
return lhs / rhs
perform_operation('+', 7, 3) == 10
The perform_operation
function has a lot of boilerplate repetition.
Let's use a data structure instead to use less code and make it easier to extend.
import operator
operator.add(7, 3)
OPERATOR_MAPPING = {
'+': operator.add,
'-': operator.sub,
'/': operator.truediv,
}
OPERATOR_MAPPING['+']
OPERATOR_MAPPING['+'](7, 3)
def perform_operation(op, lhs, rhs):
return OPERATOR_MAPPING[op](lhs, rhs)
perform_operation('+', 7, 3) == 10
def calc(expr):
lhs, op, rhs = expr
lhs, rhs = int(lhs), int(rhs)
return perform_operation(op, lhs, rhs)
calc('7+3')
calc('9-5')
calc('9/3')
calc('3*4')
OPERATOR_MAPPING['*'] = operator.mul
calc('3*4')
Let's look at another example. Suppose we have data where every line is fixed length with fixed length records in it and we want to pull fields out of it by name:
PYTHON_RELEASES = [
'Python 3.4.0 2014-03-17',
'Python 3.3.0 2012-09-29',
'Python 3.2.0 2011-02-20',
'Python 3.1.0 2009-06-26',
'Python 3.0.0 2008-12-03',
'Python 2.7.9 2014-12-10',
'Python 2.7.8 2014-07-02',
]
release34 = PYTHON_RELEASES[0]
release = ReleaseFields(release34) # 3.4.0
assert release.name == 'Python'
assert release.version == '3.4.0'
assert release.date == '2014-03-17'
This works:
class ReleaseFields:
def __init__(self, data):
self.data = data
@property
def name(self):
return self.data[0:6]
@property
def version(self):
return self.data[7:12]
@property
def date(self):
return self.data[13:23]
release34 = 'Python 3.4.0 2014-03-17'
release = ReleaseFields(release34)
assert release.name == 'Python'
assert release.version == '3.4.0'
assert release.date == '2014-03-17'
However, the following is better especially if there are many fields or as part of a libary which handle lots of different record formats:
class ReleaseFields:
slices = {
'name': slice(0, 6),
'version': slice(7, 12),
'date': slice(13, 23)
}
def __init__(self, data):
self.data = data
def __getattr__(self, attribute):
if attribute in self.slices:
return self.data[self.slices[attribute]]
raise AttributeError(
"{!r} has no attribute {!r}"
.format(self, attribute))
release = ReleaseFields(release34)
assert release.name == 'Python'
assert release.version == '3.4.0'
assert release.date == '2014-03-17'
Confirm that trying to access an attribute that doesn't exist fails
correctly. (Note they won't in Python 2.x unless you add (object)
after class ReleaseFields
).
release.foo == 'exception'
If you find yourself writing lots of boilerplate code as in the first versions of the calculator and fixed length record class above, you may want to try changing it to use a Python data structure with first class objects.
It is often useful to bind data to a function. A method clearly does that, binding the instance's attributes with the method behaviour, but it's not the only way.
def log(severity, message):
print('{}: {}'.format(severity.upper(), message))
log('warning', 'this is a warning')
log('error', 'this is an error')
Create a new function that specifies one argument.
def warning(message):
log('warning', message)
warning('this is a warning')
Create a closure from a function that specifies an argument.
def create_logger(severity):
def logger(message):
log(severity, message)
return logger
warning2 = create_logger('warning')
warning2('this is a warning')
Create a partial function.
import functools
warning3 = functools.partial(log, 'warning')
warning3
warning3.func is log
warning3.args, warning3.keywords
warning3('this is a warning')
Use a bound method.
SENTENCE_PUNCUATION = '.?!'
sentence = 'This is a sentence!'
sentence[-1] in SENTENCE_PUNCUATION
'.' in SENTENCE_PUNCUATION
SENTENCE_PUNCUATION.__contains__('.')
SENTENCE_PUNCUATION.__contains__(',')
is_end_of_a_sentence = SENTENCE_PUNCUATION.__contains__
is_end_of_a_sentence('.')
is_end_of_a_sentence(',')
Create a class with a __call__
method.
class SentenceEndsWith:
def __init__(self, characters):
self.punctuation = characters
def __call__(self, sentence):
return sentence[-1] in self.punctuation
is_end_of_a_sentence_dot1 = SentenceEndsWith('.')
is_end_of_a_sentence_dot1('This is a test.')
is_end_of_a_sentence_dot1('This is a test!')
is_end_of_a_sentence_any = SentenceEndsWith('.!?')
is_end_of_a_sentence_any('This is a test.')
is_end_of_a_sentence_any('This is a test!')
Another way that mutable data can be bound to a function is with parameter evaluation, which is sometimes done by mistake.
def f1(parameter=print('The parameter is initialized now!')):
if parameter is None:
print('The parameter is None')
return parameter
f1()
f1() is None
f1('Not None')
def f2(parameter=[0]):
parameter[0] += 1
return parameter[0]
f2()
f2()
f2()
f2()
import collections
Month = collections.namedtuple(
'Month', 'name number days',
verbose=True) # So it prints the definition
Month
jan = Month('January', 1, 31)
jan.name, jan.days
jan[0]
feb = Month('February', 2, 28)
mar = Month('March', 3, 31)
apr = Month('April', 4, 30)
months = [jan, feb, mar, apr]
def month_days(month):
return month.days
month_days(feb)
import operator
month_days = operator.attrgetter('days')
month_days(feb)
month_name = operator.itemgetter(0)
month_name(feb)
sorted(months, key=operator.itemgetter(0))
sorted(months, key=operator.attrgetter('name'))
sorted(months, key=operator.attrgetter('days'))
'hello'.upper()
to_uppercase = operator.methodcaller('upper')
to_uppercase('hello')