Turning objects lazy in Python

zerotypic
4 min readJan 19, 2022

While working on wilhelm recently, I needed to create a large number of objects; one representing each function present in the module under analysis. Since there would be many function objects, they needed to be lightweight. But I also wanted the ability to automatically access richer features on the function objects when required.

A common way to solve this problem is to use laziness. Under lazy evaluation, a value is only computed when it is needed. A lazy object representing a function would be quick to create, and only populate itself with more information (obtained by decompiling the function) when that information is requested.

Python has some data structures that are lazy-like, such as iterators/generators, where a list can be implemented such that each subsequent element of the list is generated only when needed. This wasn’t exactly what I needed though. There didn’t seem to be any standard libraries that implemented laziness the way I wanted either, so I decided to roll my own.

My goal was to write a metaclass, that when applied to a class, would turn it into a lazily-loaded version of the class (I term this lazifying the class). The most straightforward way to achieve this is by creating a proxy class. An instance of the proxy class creates an instance of the real class the first time any attribute is accessed, and then forwards all accesses to this real class instance. This can be done in Python by overriding the special methods __getattribute__() and __setattr__().

One issue with this strategy is that it results in a slight overhead whenever any attribute of an instance is accessed: the proxy class instance needs to make a check to see if a real class instance was created yet. Honestly, on a modern computer, such a check is unlikely to be that expensive, even though it might take place hundreds of times for every object. So, it might not be worth trying to optimize it away. It made me wonder, though, if there was some way to avoid that check: a metaclass designed such that once the lazified object was evaluated (hereafter termed reified), the resultant object would be identical to an instance of a class that had never been lazified.

The most natural place to start looking was Python class instantiation. When an instance of a class is created, a two-step process takes place. First, memory for the object is allocated via __new__(), and next, the object is initialized by calling the __init__() constructor. My idea was to somehow separate the two steps, so allocation happens first (giving us a “lazy” object), and initialization happens later when attributes are accessed (giving us the “real” object).

However, the two steps happen together by default; Python will always first call __new__() and then call __init__(). How could I get one to run but not the other?

It turns out that one way of doing this is by changing the type (i.e. class) of the object returned by __new__(). Python will call the __init__() constructor defined for the class of the object. If we change the object’s class to something else, then the original class’s __init__() will not get run. We can do this by modifying the __class__ attribute of the object returned by __new__(), swapping it to refer to some other class.

Swapping the class also gives us other advantages. We can define methods on the swapped class (hereafter referred to as Lazy) that would then be accessible on the lazy object. In particular, our implementations of __getattr__() and __setattr__() can live inside Lazy.

When it is time to reify the lazy object, all we need to do is change __class__ back to the real class, and then call the __init__() constructor. This completes the instantiation of the real object, and the result is an instance of the real class that behaves exactly as if it had never been lazified. No additional checks or proxying takes place when accessing attributes on that instance.

Here’s an example of what such a metaclass would look like. In the code, I implemented the metaclass’s __call__() method to return a new _Lazy instance instead of a real instance, so we wouldn’t need to override any implementation of __new__() in the real class. Instead, _Lazy.__new__() will run, and our magic takes place there.

Using this technique, I implemented a more complete metaclass, which I’ve put on GitHub as bummer (it turns classes into lazy bums). A separate _Lazy class is created per lazified class, which allows type-checking to still work, somewhat: for example, a lazy instance of a class will report itself to be an instance of that class (see __instancecheck__() and __subclasscheck__() on LazyMeta).

I also added an additional feature: classes can define a method __lazy_preinit__(), which will be called on the lazy instance when it is created. This allows values to be set on the lazy instance, and when these values are accessed, they do not trigger reification. For example, I can use it to set the address field of lazy function objects, so I can query the address of a lazy function object without triggering reification (and decompilation).

Finally I added an additional factory metaclass, LazyFactoryMeta, which will maintain a cache of objects and return previously created objects based on the arguments supplied to the constructor (memoization, essentially).

bummer is a single Python file, so if you want to use it, you can just drop it into your project.

Do you want to use it though? It’s kind of hacky, it relies on the internals of Python’s object instantiation process, and it might not really be that much faster than a proxying solution. I’ve also not tested it on more complicated kinds of classes; for example, it’ll probably break if __slots__ is used. But it was a fun exercise, and who knows, maybe it will be robust enough to be used in production code.

Note: I’ve only tested this on Python 3. No idea if it will work, at all, if ported to Python 2.

--

--