This is a trivial example, but nonetheless useful, of why measuring code changes is relevant when we are adding complexity.
After reading about Chrome's new native image lazy-load attribute, I decided to implement it as a Pelican plugin (source code here), because despite I use Firefox, Chrome is currently the dominant browser.
I already had to iterate a few times over the original < 40 lines implementation which simply added the attribute:
I had to ignore base64-encoded inline images, plus raw markup not being really images (e.g. code containing <img>
), plus small clean-ups.
As all the post images don't have width
nor height
attributes, Chrome was reflowing and sometimes even requesting them twice due to CSS computations, so I built a small json-backed image dimensions cache that fetches post images, checks their size using PIL
, and adds the appropriate markup attributes.
Afterwards, as the post images have a maximum width, for big ones I had them vertically stretched, so I had to add another check to restrict the width and if so, recalculate the "resized height" too, keeping the original image proportions.
Notice how what was a simple "add an attribute to each image" grew to a 120 lines plugin? It is still quite small and manageable, but it has to handle error scenarios, different nitpickings that most posts won't require, and the caching not only relies on external IO (both local files and network requests) but also measures around 40% of the code.
So, when today I was thinking about how could I try to improve its speed, my first requisite was measuring the performance gain vs complexity added. Each article (post/page) in Pelican triggers an execution of the plugin logic, which already only loads and saves the cache once per post (instead of once per image). This still means acquiring two file handles, reading/writing and closing. With the existing code and this blog, this was the original metric:
Processed 500 articles, 0 drafts, 2 pages, 7 hidden pages and 0 draft pages in 8.20 seconds
Pelican plugins do not keep state by themselves, which is nice but hinders in-memory caching. A hack that I came up to overcome this limitation was storing the cache at instance.settings["IMAGE_CACHE"]
, because Python dictionaries are always by ref arguments. This way I could only load once the cache from file, while still storing it per article to play safe. A few lines of code added, and I measured how faster it got:
Processed 500 articles, 0 drafts, 2 pages, 7 hidden pages and 0 draft pages in 8.18 seconds
😐 Now, I don't know you, but to me waiting 20 milliseconds more when regenerating the blog... it is not much of a difference. And if I kept the optimization, I would be reducing but not fully removing the disk IO operations, while adding another crossed boundary to care about: runtime settings mutation. If It had went down a few seconds it might be worth it, but for a 0.24% gain on a personal project, I'd rather keep the plugin code simple.
And of course, if I hadn't measured it I would have just left it because it's awesome to optimize and your own code never fails even when doing hacky things and squeezing the maximum is always great and... The fact is that both my SSD hard drive and Linux are smarter than me and reading 500 times a json file so quickly and consecutively is surely buffered and cached by the drivers and will only reach the physical storage at the end, so why bothering in this scenario.
Update: What I did actually implement is a small ImageCache
class with a simple dirty
flag to avoid unnecessary writes if cache hasn't changed. this took down 50% of IO operations and around 2 seconds in exchange of a few extra lines of code and an actually cleaner logic segregation.
Tags: Development