Reducing memory usage with recordclass library

The simplest and perhaps the only way to reduce the amount of memory required to store an instance of user-defined classes is to use __slots__. In this case, there is no __dict__ and support of weakref. As a consequence, the attribute values of the class instances are not stored in the dictionary, but in an additional fixed memory area. It's allow to reduce memory footprint significantly and get faster access to attributes. It also makes impossible to use attributes, except for those that are listed in __slots__.

Let's consider a class for representation a simple data structure with __slots__:

In [1]:
class DataItem:
    __slots__ = ('name', 'age', 'address')
    def __init__(self, name, age, address):
        self.name = name
        self.age = age
        self.address = address

Here the size of the its instance:

In [2]:
from sys import getsizeof as sizeof, version
print(version)
inst = DataItem('Mike', 10, 'Cherry Street 15')
print('The size of DataItem instance is %s bytes' % sizeof(inst))
3.7.2 (v3.7.2:9a3ffc0492, Dec 24 2018, 02:44:43) 
[Clang 6.0 (clang-600.0.57)]
The size of DataItem instance is 64 bytes

It includes 16 bytes for object's header (PyObject_HEAD), 83=24 bytes for three data slots (references of the objects -- values of attributes) and 83=24 bytes of additional data (PyGC_Head) for cyclic garbage collection support.

It is easy to see that for the representation of data structures that contain only the values of simple types (for example, bool, int, float, str/unicode, datetime/date/time and etc.), support for cyclic garbage collection is redundant.

For such cases, disabling support for cyclic garbage collection will reduce the amount of memory required to store an instance of the class by 8*3=24 bytes. Note that for simple/atomic data types in python, support for cyclic garbage collection is disabled: only the usual garbage collection mechanism is enabled.

Recordclass library allows to create classes in which, by default, there is no __dict__, __weakref__, and support for circular garbage collection is disabled. For small data structures, this can lead to significant memory savings, which can be essential for using Python in conditions of limited memory (for example, in context of cloud services and Rasberry Pi).

For this purpose, the library has a factory function recordclass.structclass. In order to explain let's first install recordclass library:

>>> pip install recordclass

In [3]:
from recordclass import structclass
from sys import getsizeof as sizeof

DataItem2 = structclass('DataItem', 'name age address')
inst2 = DataItem2('Mike', 10, 'Cherry Street 15')
print(inst2)
print('The size of DataItem2 instance is %s bytes' % sizeof(inst2))
DataItem(name='Mike', age=10, address='Cherry Street 15')
The size of DataItem2 instance is 40 bytes

There is also one advantage over __slots__-based class: you are able to add extra attributes when it is necessary:

In [4]:
DataItem3 = structclass('DataItem', 'name age address', usedict=True)
inst3 = DataItem3('Mike', 10, 'Cherry Street 15')
inst3.hobby = ['drawing', 'singing']
print(inst3)
print('sizeof:', sizeof(inst3), 'has dict:',  bool(inst3.__dict__))
DataItem(name='Mike', age=10, address='Cherry Street 15', **{'hobby': ['drawing', 'singing']})
sizeof: 48 has dict: True

If it is important it can be turned on support of cyclic garbage collection. In that case it will have same memory size as __slots__-based one.

In [5]:
from recordclass import structclass
DataItem4 = structclass('DataItem', 'name age address', gc=True)
inst4 = DataItem4('Mike', 10, 'Cherry Street 15')
print(inst4)
print(sizeof(inst4))
DataItem(name='Mike', age=10, address='Cherry Street 15')
64

Last the example below explains reducing memory footprint for small data structures.

In [6]:
class Point1:
    __slots__ = ('x', 'y')
    def __init__(self, x, y):
        self.x = x
        self.y = y

Point2 = structclass('Point2', 'x y')

lst1 = tuple(Point1(i, i) for i in range(10000))
lst2 = tuple(Point2(i, i) for i in range(10000))

def calculate_size(lst):
    size1 = sizeof(lst)
    size2 = sum(sizeof(ob) for ob in lst)
    size = size1 + size2
    return size

size1 = calculate_size(lst1)
size2 = calculate_size(lst2)

print('__slots__: %s = 100%%' % size1, 'structclass: %s = %.0f%%' % (size2, 100*size2/size1))
__slots__: 640048 = 100% structclass: 400048 = 63%
In [ ]: