Kartones Blog

Be the change you wanna see in this world

Encoding JPGs with Google's Guetzli

When I first read about Guetzli I thought that it probably wouldn't be so much, that sounded too good to be true; A 35% size reduction in JPGs, backwards-compatible and with no visible quality loss sounded quite nice. Then, a few days ago we were talking at work about how to optimize our homepage's speed and I reminded about the algorithm, and decided to give it a try over the weekend.

My results can be quickly summarized: It is as great compressing as slow doing it. But recommended if speed is not an issue for you.

Basically, it consumes around 200MB of RAM and takes one minute per source image megapixel. This means that a tiny 150x200 thumbnail takes more than 15 seconds to encode on my laptop, while ImageMagick encodes a FullHD PNG into JPG in around a second. It is indeed extremely slow, making it unfeasible for encoding on the fly blog post images and the like. Nothing that cannot be solved with an asynchronous job that sweeps and optimizes afterwards, but extra work to be done.

Regarding the size, I can confirm I got size reductions between 35% and 50%, sometimes a bit more. Source images were all JPGs with quality between 85% and 95% and sizes between 150x150 and 1900x1280. Overall, the images "block" went from 290 MB to 110 MB. Some of the images probably were not optimally compressed as when I used Windows I had a PowerShell script doing the resize instead of ImageMagick, but still it is a huge win, and I cannot even imagine how well it will reduce images of blogs where non-tech people upload images sometimes as they come without any quality or resolution resize.

I ran the big re-encoding of full folders at 85% quality, with the following exact run params (available here too):

find . -type f -name "*.jpg" -exec guetzli --quality 85 {} {}.jpeg \;
find . -type f -name "*.jpg" -exec rm {} +
find . -type f -name "*.jpeg" | rename "s/\.jpeg$//"

As I was going through hundreds of images coming from years of different sources (phones, cameras, image manipulation applications...), I had a small ratio of failures around 1.7% (if I recall correctly the number of failed images) of almost 2900 JPGs. Guetzli as a commnad-line application is really basic, and the verbosity is too scarce to be of use in batching (if you care about errors, that is), so I decided to add some dumb fprintfs and output failed files, so with some replacements and regular expressions I could easily grab all failed images. My fork can be found here.

And really that's all I had to say regarding it. Clearly the opensourced tool is not the same stuff Google uses, as there is a single test and I refuse to believe they would use that command line tool in production, but the algorithm is there, is as awesome as advertised and I'll definetly optimize my blog post images from now on with it.


Book Review: Thinking Fast and Slow

Finishing this book took me some time, because I've heard it in audio-book format and commuting to my previous job was noisy and distracting to focus on hearing, but with my recent change I now can perfectly listen to voice audios.

Review

Thinking Fast and Slow book cover

Title: Thinking Fast and Slow

Author: Daniel Kahneman

How we think is managed by two systems/fragments/"selfs" of the brain: One is impulsive and quick-thinking but prone to biases and errors in judgement, the other is slower, recalls memories and experience, but is "lazy" and tends to delegate to the "System one". Along this book we're taught how we commit lots of mistakes, misjudgements, wrong decisions, and in general kind of get tricked by ourselves. But also, we learn valuable leasons to become better at handling this situations (spoiler: take your time and think).

There are many chapters, focusing on different aspects, fears, biases, mistakes, choice-making... Full of examples, studies and tips to learn to avoid them. They are useful in most areas, from personal life, well-being and relations to business-related decisions.

To mention something not great, there are quite a few military-related remarks, for example examples of loss adversion with "selling missiles", when there are infinite less belicist examples. I understand that for the author (Hebrew and with a military past) it might be normal, but for me it strikes as a sad way of exemplification. Also, some chapters are a bit of an artificial separation. For example, gambling-related topics take multiple chapters.

Despite my small critics, this book is a must read. The best way to combat biases is to know them, and with so many examples becomes clear we're a bit flawed and should not rush on our decision taking actions.

Notes

Sadly I didn't noted everything as sometimes is hard to take notes and I cannot highlight as with Kindle books, but I think I wrote most of the important topics (for me at least).

  • Two systems that drive the way we think. System 1 is fast, intuitive, and emotional; System 2 is slower, deliberative, logical
  • Most important mistake our brain takes as the common truth: what you see is all there is
  • Law of small numbers
  • Law of averages: belief that the statistical distribution of outcomes among members of a small sample must reflect the distribution of outcomes across the population as a whole
  • Most of what happens in life is random (most facts in the world are random). causal explanations are dangerous and many times wrong
  • Anchoring effect
  • Availability heuristic
  • Statistics' [base rate])https://en.wikipedia.org/wiki/Base_rate_
  • Conjunction fallacy: The probability that two events will both occur can never be greater than the probability that each will occur individually
  • success = skill + luck . greater sucess = more skill + lot more luck
  • Regression to the mean
  • Intensity matching
  • When we change our view of the world, we lose or weaken our hability to recall the old view. it is a weakness of our mind
  • Hindsight bias
  • Errors of prediction are inevitable because the world is umpredictable
  • Subjective confidence shouldn't be treated as competence. low confidence can be more confident
  • Intuition adds but after disciplined objective recollection
  • Outcomes are gains and loses (utility of wealth): changes of wealth instead of states of wealth
  • Theory-induced blindness
  • Human brain gives priority to bad news
  • Good relations involve more avoiding bad moments than having good moments
  • Cuantification is much more powerful than mere numbers or percentages. eg. 4 out of 101 ... vs 40% ...
  • Narrow framing vs broad framing
  • Sunk cost fallacy
  • Losses evoke stronger feelings than costs

On Elastic Beanstalk, Docker and CircleCI

I joined ticketea's engineering team last month, and apart from learning how things work and doing some bugfixing weeks (to get comfortable with the code and peek at some of the projects), I also got assigned to one of the new projects. There are three projects that we have started from scratch, allowing us to decide if to keep or change the current platform (which could be more automated). In order to take decisions, we did some research and proofs of concept.

The main goal of the research was to setup a basic AWS Elastic Beanstalk orchestation system, to allow us to perform deploys, local runs, etc. without needing to manually handle EC2 instances and build the corresponding toolset, as we don't have any systems team.

Our results are mixed but still subject to change as we haven't yet discarded or decided for a certain route, we keep exploring multiple paths with the projects to decide later. Despite that, I'll leave here some notes and references. Don't expect great notes as this is more of a cleanup of a worklog/checklist (actually, it was a simple Github issue).


CircleCI

We'll stick with CircleCI as our test runner, builder and probably continuous deployment tool for staging. Version 2.0 works nicely with containers and, despite being heavily modified from v1.0, modifications were quick to perform.

Elastic Beanstalk

EB has been relegated to staging/production deployment. For that, the cluster features (load balancing, rolling deploys, etcetera) are great and very easy to use. Instead, for local development it is between painful and directly impossible without hacks to work decently. The reasons are multiple, primarly being:

  • You cannot use docker-compose as EB internally uses it and forces you to use their YML config files or rely on fully manual Makefiles + raw Docker
  • eb local works only on pretty much factory-default scenarios. As soon as you start working on real services, it just doesn't works
  • EB works using environments, but it is configured so one "folder" is the equivalent to one environment. So having dev, staging, production etc. means one of the two following hacks:
    • Have a single root dockerrun.aws.json with placeholder variables that you replace by the appropiate enviroment values
    • Have multiple dockerrun.aws.json at subfolders (one per environment) and move them via Makefile or similar to the root depending on where you run it
  • We've become more proficient on using "raw" docker, but in the end we decided to still use docker-compose, even if only for development. It saves you a lot of command line writing and is quick to change.

Resources:

EB Configuration files

Alternatives to EB

One of the teams, after asking for some feedback to friends and colleages is testing Terraform. It looks promising and is working fine for them but also needs maintenance, so there is no firm decision yet regarding if to use it or stick to Elastic Beanstalk and Makefiles (at least for now).

ECS + ECR

We setup a registry and pushed both development and production images after successful builds. It works quite nicely and the only reason we are not using them actively is to try to avoid the permissions hell you enter once you want to share images between different Amazon accounts (not just IAM users on the same account, but fully separate ones).

Redis

We are using Redis for our project, a docker image for development and Elasticache for staging and production.

Tools/extensions to check and add if interesting


Pelican Publisher Script

When kartones.net was a blogging community and not my current personal minimalistic landing page, one of the blogs that my friend Lobo666 and I maintained was Uncut. With the change to BlogEngine.Net it kept working easily, with a combination of a WYSIWYG editor (or Windows Live Writer) and uploading post images via FTP (minor manual step). But when I recently moved everything to static sites, as Pelican not only doesn't provides any editor but forces you to build the site to preview the changes, my friend was quite impeded to keep posting at the blog.

On the other hand, I already had some post-processing scripts, to cleanup some files that were always copied to the output folder (and thus, uploaded to the site) and to do other tiny tasks like duplicating files (I want to maintain backwards compatibility with the original RSS feed addresses of the old blogs). They were ad-hoc, but after showing them to my friend he just asked me "if I could just make those scripts also upload automatically the modified files". And indeed, making some changes to pass by command line optionally some post identifier (I decided to use the slug) would help. As would too ease things just removing all the "full indexed pages" that Pelican builds (index<zero-to-almost-infinite>.html pages), and just leaving 10 pages and a link to the full archives page:

Blog paging screenshot

This way, and removing the tags, categories and authors subfolders as I don't use them, the number of modified files to upload on a mere new blog post action is around a dozen, making it blazing fast to "deploy" with some Python code. In the end, generalizing the script for the three blogs that I still write and/or maintain, by specifying a few configuration parameters you can specify folders to create or delete, files to copy, remove, duplicate, truncating the index files... and of course upload a post or just build without uploading.

I don't want to extend myself much more as the utility of this tool is limited and very specific, getting to the point, I uploaded the script files to my Python assorted GitHub repo. The direct url of the publisher files is: https://github.com/Kartones/PythonAssorted/tree/master/pelican/publisher.

Usage is quite simple:

python3 publisher.py your-great-post-slug

And to only build:

python3 publisher.py

And that's all. Until next time :)


Recommended Articles - 2017/04/01

As I recently switched job and took a few days of vacations in between, not much relevant to write about on the personal side, so another bunch of relevant articles I've read recently.


Previous entries