Python includes many built-in language features to enable concise, easily-understood code. Some of these niceties include list/set/dictionary comprehensions, properties, and decorators. For the most part, these "intermediate-level" language features are well-documented, and easy to learn.
There is one notable exception to this: descriptors. For me at least, descriptors were the feature of the core Python language that remained mysterious for the longest time. There are a few reasons for this:
The official documentation on descriptors is rather esoteric, and doesn't include good use cases for why you might write descriptors (My apologies to Raymond Hettinger, whose other Python articles and videos I have found very helpful).
The syntax for writing descriptors is a little weird.
Custom descriptors might be the least-utilized feature of the Python language, so it's hard to find good examples in open source projects.
Nevertheless, descriptors do have their use once you figure them out. This document tries to build the argument for what descriptors do, and why you should care.
Here's what we're building up to: fundamentally, descriptors are properties that you can reuse. That is, descriptors let you write code that looks like this
f = Foo()
b = f.bar
f.bar = c
del f.bar
and, behind the scenes, calls custom methods when trying to access (b = f.bar
), assign to (f.bar = c
), or delete an instance variable (del f.bar
)
Let's establish why being able to disguise function calls as attribute access is a good thing.
Imagine you are writing some code to organize information about movies (spoiler alert: these projects beat you to it). You might end up with a movie class that looks like this:
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self.title = title
self.rating = rating
self.runtime = runtime
self.budget = budget
self.gross = gross
def profit(self):
return self.gross - self.budget
You start using this class in other parts of your project, but then you realize something: by mistake, you sometimes assign negative budgets to movies. You decide this is bad, and want the Movie class to forbid this. The first thing you think to try is this:
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self.title = title
self.rating = rating
self.runtime = runtime
self.gross = gross
if budget < 0:
raise ValueError("Negative value not allowed: %s" % budget)
self.budget = budget
def profit(self):
return self.gross - self.budget
But that won't work, because other parts of your code assign values to Movie.budget
directly -- this new class catches data entry errors within the __init__
method, but not the cases where somebody tries to run m.budget = -100
on a pre-existing instance. What's a cinephile pythonista to do?
Luckily, Python properties solve this problem. If you've never seen properties before, here's how they work:
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self._budget = None
self.title = title
self.rating = rating
self.runtime = runtime
self.gross = gross
self.budget = budget
@property
def budget(self):
return self._budget
@budget.setter
def budget(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._budget = value
def profit(self):
return self.gross - self.budget
m = Movie('Casablanca', 97, 102, 964000, 1300000)
print m.budget # calls m.budget(), returns result
try:
m.budget = -100 # calls budget.setter(-100), and raises ValueError
except ValueError:
print "Woops. Not allowed"
964000 Woops. Not allowed
We specify a getter method with a @property
decorator, and a setter method with a @budget.setter
decorator. When we do that, Python automatically calls the getter whenever anybody tries to access the budget. Likewise Python automatically calls budget.setter
whenever it encounters code like m.budget = value
.
Take a moment to appreciate how nice it is that Python does this: if properties didn't exist, we'd have to hide all of our instance attributes, and provide lots of explicit methods like get_budget
and set_budget
. Code that uses our classes would constantly be calling these getter/setter methods, and would start to look like crufty Java code. Even worse, if we ignored this coding style and just gave direct access to an instance attribute like budget
, there would be no clean way to later add the non-negativity check -- we would have to retroactively create the set_budget
method, and search our entire project to change lines like m.budget = value
to m.set_budget(value)
. Gross.
So properties let you attach custom code to variable getting/setting, while maintaining a simple attribute-like interface for your classes. Nice.
The main downside to properties is that they aren't reusable. For example, let's assume you want to add the non-negativity check to the rating
, runtime
, and gross
fields as well. Here's the new class
class Movie(object):
def __init__(self, title, rating, runtime, budget, gross):
self._rating = None
self._runtime = None
self._budget = None
self._gross = None
self.title = title
self.rating = rating
self.runtime = runtime
self.gross = gross
self.budget = budget
#nice
@property
def budget(self):
return self._budget
@budget.setter
def budget(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._budget = value
#ok
@property
def rating(self):
return self._rating
@rating.setter
def rating(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._rating = value
#uhh...
@property
def runtime(self):
return self._runtime
@runtime.setter
def runtime(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._runtime = value
#is this forever?
@property
def gross(self):
return self._gross
@gross.setter
def gross(self, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self._gross = value
def profit(self):
return self.gross - self.budget
That's a lot of code, and a lot of duplicated logic. While properties make the outsides of classes look nice, they don't make the insides of classes look nice.
This is the problem that descriptors solve. Descriptors generalize properties, and let you write separate classes for reusable property logic. Here's an example of how they work (for the moment, don't worry about what's inside NonNegative
):
from weakref import WeakKeyDictionary
class NonNegative(object):
"""A descriptor that forbids negative values"""
def __init__(self, default):
self.default = default
self.data = WeakKeyDictionary()
def __get__(self, instance, owner):
# we get here when someone calls x.d, and d is a NonNegative instance
# instance = x
# owner = type(x)
return self.data.get(instance, self.default)
def __set__(self, instance, value):
# we get here when someone calls x.d = val, and d is a NonNegative instance
# instance = x
# value = val
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self.data[instance] = value
class Movie(object):
#always put descriptors at the class-level
rating = NonNegative(0)
runtime = NonNegative(0)
budget = NonNegative(0)
gross = NonNegative(0)
def __init__(self, title, rating, runtime, budget, gross):
self.title = title
self.rating = rating
self.runtime = runtime
self.budget = budget
self.gross = gross
def profit(self):
return self.gross - self.budget
m = Movie('Casablanca', 97, 102, 964000, 1300000)
print m.budget # calls Movie.budget.__get__(m, Movie)
m.rating = 100 # calls Movie.budget.__set__(m, 100)
try:
m.rating = -1 # calls Movie.budget.__set__(m, -100)
except ValueError:
print "Woops, negative value"
964000 Woops, negative value
There's some new syntax in here, so let's look at things piece by piece:
NonNegative
is a descriptor object. It's a descriptor because it defines the __get__
, __set__
, or __delete__
method.
The Movie
class looks very clean. We create 4 descriptors at the class level, and treat them like normal (instance-level) attributes everywhere else. And apparently, the desciptors are checking for non-negative values for us.
When Python sees the line print m.budget
, it recognizes that budget is a descriptor with a __get__
method. Instead of passing m.budget
to print directly, it calls Movie.budget.__get__
, and feeds the result of that to print. This is similar to what happens when you access a property -- Python automatically calls a method, and returns the result.
__get__
receives two arguments: the instance object to the left of the period (that is, the m
object in m.budget
), and the type of that instance (Movie
). In some Python documentation, Movie
is called the owner of the descriptor. If we had asked for Movie.budget
, Python whould have called Movie.budget.__get__(None, Movie)
; that is, the fist argument is either an instance of the owner, or None
. These input arguments may seem weird to you, but they're there to give you information about what object the descriptor is part of. This will make sense once we look inside the NonNegative
class.
When Python sees m.rating = 100
, Python recognizes rating
is a descriptor with a __set__
method, and it calls Movie.rating.__set__(m, 100)
. Like __get__
, the first argument of __set__
is the instance to the left of the period (the m
in m.rating = 100
). The second argument is the value to the right of the equals sign (100).
For the sake of completeness, if you call del m.budget
, Python will call Movie.budget.__delete__(m)
.
With this in mind, we can now look to see how the NonNegative
class works. Each instance of NonNegative
maintains a dictionary that maps owner instances to data values. When we call m.budget
, the __get__
method looks up the data associated with m
, and returns the result (or a default value, if no such value exists). __set__
uses the same approach, but includes the extra non-negativity check. We use a WeakKeyDictionary
instead of a normal dict
to prevent a memory leak -- we don't want an instance to stay alive simply because it's in the descriptor dictionary, and otherwise unused.
Working with descriptors is slightly awkward. Because they live at the class level, every instance shares the same descriptor. This means that descriptors have to manually manage different states for different object instances, and need to explicitly be passed instances as the first argument of the __get__
, __set__
, and __delete__
methods.
Hopefully, however, this example gives you an idea of what descriptors can be useful for -- they provide a way to organize property logic into isolated classes. If you find yourself repeating the same logic across several properties, that should be a clue to consider whether refactoring that code into a descriptor is worthwhile.
class Broken(object):
y = NonNegative(5)
def __init__(self):
self.x = NonNegative(0) # NOT a good descriptor
b = Broken()
print "X is %s, Y is %s" % (b.x, b.y)
X is <__main__.NonNegative object at 0x10432c250>, Y is 5
As you can see, accessing the class-level descriptor y automatically calls __get__
. However, accessing the instance-level descriptor x
returns the descriptor itself, sans magic.
You might be tempted to write the NonNegative
descriptor like this
class BrokenNonNegative(object):
def __init__(self, default):
self.value = default
def __get__(self, instance, owner):
return self.value
def __set__(self, instance, value):
if value < 0:
raise ValueError("Negative value not allowed: %s" % value)
self.value = value
class Foo(object):
bar = BrokenNonNegative(5)
f = Foo()
try:
f.bar = -1
except ValueError:
print "Caught the invalid assignment"
Caught the invalid assignment
That seems to work fine. The problem here is that all instances of Foo
share the same bar
instance, leading to this flavor of sadness:
class Foo(object):
bar = BrokenNonNegative(5)
f = Foo()
g = Foo()
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)
print "Setting f.bar to 10"
f.bar = 10
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar) #ouch
f.bar is 5 g.bar is 5 Setting f.bar to 10 f.bar is 10 g.bar is 10
This is why we used the data dictionary in NonNegative
. The first argument to __get__
and __set__
tell us which instance to consider. NonNegative
uses this argument as a dictionary key, to keep data for each Foo
instance separate.
class Foo(object):
bar = NonNegative(5)
f = Foo()
g = Foo()
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar)
print "Setting f.bar to 10"
f.bar = 10
print "f.bar is %s\ng.bar is %s" % (f.bar, g.bar) #better
f.bar is 5 g.bar is 5 Setting f.bar to 10 f.bar is 10 g.bar is 5
This is the most awkward aspect of descriptors (full disclosure: I don't actually understand why Python doesn't let you define descriptors at the instance level, and always dispatch to __get__
and __set__
. There must be some reason why this doesn't work. UPDATE: Thanks to Louie Dinh who pointed me to the reason why: see this post if you're interested).
NonNegative
uses a dictionary to keep instance-specific data separate. This normally works fine, unless you want to use descriptors with unhashable objects:
class MoProblems(list): #you can't use lists as dictionary keys
x = NonNegative(5)
m = MoProblems()
print m.x # womp womp
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-dd73b177bd8d> in <module>() 3 4 m = MoProblems() ----> 5 print m.x # womp womp <ipython-input-3-6671804ce5d5> in __get__(self, instance, owner) 9 # instance = x 10 # owner = type(x) ---> 11 return self.data.get(instance, self.default) 12 13 def __set__(self, instance, value): TypeError: unhashable type: 'MoProblems'
Because instances of MoProblems
(which is a subclass of list
) aren't hashable, they can't be used as keys in the data dictionary for MoProblems.x
. There are a few ways around this, though none are perfect. The best approach is probably to "label" your descriptors
class Descriptor(object):
def __init__(self, label):
self.label = label
def __get__(self, instance, owner):
print '__get__', instance, owner
return instance.__dict__.get(self.label)
def __set__(self, instance, value):
print '__set__'
instance.__dict__[self.label] = value
class Foo(list):
x = Descriptor('x')
y = Descriptor('y')
f = Foo()
f.x = 5
print f.x
__set__ __get__ [] <class '__main__.Foo'> 5
This relies on a highly non-obvious detail of Python's method resolution order. We label each descriptor in Foo with the same name as the variable that we assign the descriptor to (for example, x = Descriptor('x')
). The descriptor then stores instance-specific data in f.__dict__['x']
. This dictionary entry would normally be what Python returns when we ask for f.x
. However, because Foo.x
is a descriptor, Python doesn't use f.__dict__['x']
normally, and the descriptor can safely store stuff there. Just make sure you don't label the descriptor anything else:
class Foo(object):
x = Descriptor('y')
f = Foo()
f.x = 5
print f.x
f.y = 4 #oh no!
print f.x
__set__ __get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'> 5 __get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'> 4
I don't love this pattern, since it's fragile and subtle, but it's fairly common. And it works for unhashable owner classes. David Beazley uses it in his books
Because descriptor labels match the variable name they are assigned to, some people use metaclasses to take care of this bookkeeping automatically:
class Descriptor(object):
def __init__(self):
#notice we aren't setting the label here
self.label = None
def __get__(self, instance, owner):
print '__get__. Label = %s' % self.label
return instance.__dict__.get(self.label, None)
def __set__(self, instance, value):
print '__set__'
instance.__dict__[self.label] = value
class DescriptorOwner(type):
def __new__(cls, name, bases, attrs):
# find all descriptors, auto-set their labels
for n, v in attrs.items():
if isinstance(v, Descriptor):
v.label = n
return super(DescriptorOwner, cls).__new__(cls, name, bases, attrs)
class Foo(object):
__metaclass__ = DescriptorOwner
x = Descriptor()
f = Foo()
f.x = 10
print f.x
__set__ __get__. Label = x 10
I won't explain the details of metaclasses -- David Beazley's tutorial at the bottom of this article covers them. The main point is that the metaclass auto-assigns descriptor labels, to match the variable name that each descriptor is assigned to.
While this solves the problem of mismatched descriptor labels and variable names, it does so by adding all the complexity of metaclasses. You can decide if this is worth the hassle, but I have my doubts.
Descriptors are just classes, and you may want to add other methods to them. For example, descriptors are a great way to implement callback properties. Say we want a class to notify us whenever part of its state changes. Here's most of the code to do that
class CallbackProperty(object):
"""A property that will alert observers when upon updates"""
def __init__(self, default=None):
self.data = WeakKeyDictionary()
self.default = default
self.callbacks = WeakKeyDictionary()
def __get__(self, instance, owner):
return self.data.get(instance, self.default)
def __set__(self, instance, value):
for callback in self.callbacks.get(instance, []):
# alert callback function of new value
callback(value)
self.data[instance] = value
def add_callback(self, instance, callback):
"""Add a new function to call everytime the descriptor updates"""
#but how do we get here?!?!
if instance not in self.callbacks:
self.callbacks[instance] = []
self.callbacks[instance].append(callback)
class BankAccount(object):
balance = CallbackProperty(0)
def low_balance_warning(value):
if value < 100:
print "You are poor"
ba = BankAccount()
# will not work -- try it
#ba.balance.add_callback(ba, low_balance_warning)
This is a promising pattern -- we can attach custom callback functions to respond to state changes within a class, without having to modify the class code at all. That's a lovely separation of concerns. All we need to do now is call ba.balance.add_callback(ba, low_balance_warning)
, so that low_balance_warning
is called whenever balance
changes.
But how do we do that? Descriptors always call __get__
when we try to access them. It would seem that the add_callback
method is unreachable! The trick is to take advantage of the special case that, when accessed from the class level, the first argument to __get__
is None
:
class CallbackProperty(object):
"""A property that will alert observers when upon updates"""
def __init__(self, default=None):
self.data = WeakKeyDictionary()
self.default = default
self.callbacks = WeakKeyDictionary()
def __get__(self, instance, owner):
if instance is None:
return self
return self.data.get(instance, self.default)
def __set__(self, instance, value):
for callback in self.callbacks.get(instance, []):
# alert callback function of new value
callback(value)
self.data[instance] = value
def add_callback(self, instance, callback):
"""Add a new function to call everytime the descriptor within instance updates"""
if instance not in self.callbacks:
self.callbacks[instance] = []
self.callbacks[instance].append(callback)
class BankAccount(object):
balance = CallbackProperty(0)
def low_balance_warning(value):
if value < 100:
print "You are now poor"
ba = BankAccount()
BankAccount.balance.add_callback(ba, low_balance_warning)
ba.balance = 5000
print "Balance is %s" % ba.balance
ba.balance = 99
print "Balance is %s" % ba.balance
Balance is 5000 You are now poor Balance is 99
Hopefully, you now have an understanding of what descriptors are, and when they are useful. Go forth and refactor.
The CSS on this page is adapted from Cam Davidson-Pilon's awesome and gorgeous book.
There were some relevant talks and tutorials about descriptors and properties at PyCon 2013:
#This makes everything pretty
from IPython.core.display import HTML
from urllib import urlopen
def css_styling():
styles = open('custom.css', 'r').read()
return HTML(styles)
css_styling()