Kartones Blog

Be the change you wanna see in this world

Recommended Articles - 2017/05/19

Got delayed so more and more news piled up... ending up with a not-so-small list of relevant articles (in my opinion at least!).

  • The Lazy Manifiesto (+ its principles): Sounds like a joke but it has really good points.
  • Deadlines and commitments... the fallacy: Deadlines are often are what I call "FUU": Fictitious, Unrealistic & Unilateral. I didn't knew about Theory X and Theory Y.
  • Incoming Redis streams, which comming from Salvatore smells like really good stuff. Interesting also for custom data format geeks: how the "Listpack" format will work (currently in draft)
  • Learn Redis the hard way (in production): Speaking of Redis... interesting tips
  • [Spanish] Notas From The Trenches 2017: Donosti Edition: Really great notes about speed at a company, knowledge, learning...
  • Electron is flash for the desktop: More or less my opinion. While interesting for some scenarios, in general it is worth to keep the browser doing its thing instead of having 3-4 "capped browsers" running web-apps.
  • SCUMM-8: Amazing use of the PICO-8 virtual console.
  • Scaling Unsplash with a small team: Nice advices:
    • Build boring, obvious solutions
    • Focus on solving user problems, not technology problems
    • Throw money at technical problems
  • @ethanschoonover: "Serverless" still feels to me like a restaurant saying they are "kitchenless" so they can focus on food instead of food preparation.
  • An Illustrated History of iOS: Half educative, half source of fun, it presents a nice summary of how iOS and the "iDevices" got from a beautiful but heavily limited phone to today's fully featured high end smartphone.
  • Machine Learning and Product Managers: Interesting resources to start reading about it. Also, if interested you should check Machine Box, a nice use of Docker to quickly and very easily have ML containers that tag images or recognice faces.
  • 35 programming habits that make your code smell: Some are very obvious, others interesting, in general a nice list.
  • 8 Lines of Code: Talk about hidden complexity and "code magic", about how going simple achieves better results and how often we make things harder by making them complex.
  • How to program independent games - CSUA Speech: Talk by Jonathan Blow, creator of Braid and The Witness, about doing things simple, how being clever causes more problems and focusing instead on solving the problem and even being empathic with other's solutions is good.
  • Hard-won lessons: Five years with Node.js: I don't do (nor fancy much) Node.js, but this advices look interesting and well thought, even more considering how the platform is still young and inmature.
  • How eBooks lost their shine: 'Kindles now look clunky and unhip': Interesting point of view, although as usual I'm out of the averages, as I love my Kindle and I'm actually shrinking my physical books collection (by gifting them to friends and family). Balance and freedom of format choice are the best scenario we can seek.
  • Bias driven development - Mario Fusco: Subjective parts are not of my liking (very opinionated speaker), but all the examples of biases are good ones, and I didn't knew about this two ones:
    • Bandwagon effect: "it worked for my friend" (or in general for others)
    • Law of triviality: aka bike shedding: speak discuss of trivial things at important meetings (like details of the bike shed discussing a nuclear plant).
  • Writing the CFP: Nice compilation of tips and advices to better prepare and give talks.
  • Anonymising images with Go and Machine Box: Small but helpful tutorial, mostly due to Machine Box being so easy to use. I personally have toyed with it (but with the photo tagging box) and is great (and free!).
  • A cybersecurity researcher halts ransomware attack: Was "funny" how in Spain some big companies were sending home all their employees and telling them to shutdown their computers, while this guy actually got to analyze the traffic (and source code?) and found how to stop or at least delay the spreading of the ransomware.
  • Why the "Google Docs" worm was so convincing: Previous malware was a reality check for people who thought patching and updating can wait, this one is indeed a harder one to spot.
  • Basecamp Employee Handbook: Great content and kudos to the transparency!
  • The art of destroying software - Greg Young: I loved this talk and the concept of simplification, detachment from code (what matters is the product, what adds value to the client), of doing small sprints after which you can throw away and rewrite the piece of code if didn't go well or you had to switch priorityes... Really enlightening
  • CQRS - Alistair Cockburn: 5 minutes talk in which you'll learn hexagonal architecture, CQRS, event sourcing... great how nicely explained it is for being so short.
  • Appleā€™s New Campus: An Exclusive Look Inside the Mothership: I don't fancy much the company but the building looks... WOW
  • How to Live on 24 Hours a Day: Really great advices:
    • "Time is money. [...] Time is a great deal more than money. If you have time, you can obtain money-usually. But you cannot buy yourself a minute more time."
    • "You cannot draw on the future. Impossible to get into debt! You can only waste the passing moment. You cannot waste tomorrow, it is kept from you."
    • "We shall never have more time. We have, and have always had, all the time there is."
    • "You can turn over a new leaf every hour if you choose."
    • "Beware of undertaking too much at the start. Be content with quite a little. Allow for accidents."
    • For starters, we can stop viewing our work as our lives and learn to distinguish the two or intertwine them.
  • @sknthla: "Amazon has reduced its total shipping cost by over 50% since 2006 - unnoticed innovation in logistics". Games Workshop, one of the biggest miniature and boardgame maker, did something similar years ago, switching almost all of their metal miniatures to resin: Lower weight, lower shipping costs.

Encoding JPGs with Google's Guetzli

When I first read about Guetzli I thought that it probably wouldn't be so much, that sounded too good to be true; A 35% size reduction in JPGs, backwards-compatible and with no visible quality loss sounded quite nice. Then, a few days ago we were talking at work about how to optimize our homepage's speed and I reminded about the algorithm, and decided to give it a try over the weekend.

My results can be quickly summarized: It is as great compressing as slow doing it. But recommended if speed is not an issue for you.

Basically, it consumes around 200MB of RAM and takes one minute per source image megapixel. This means that a tiny 150x200 thumbnail takes more than 15 seconds to encode on my laptop, while ImageMagick encodes a FullHD PNG into JPG in around a second. It is indeed extremely slow, making it unfeasible for encoding on the fly blog post images and the like. Nothing that cannot be solved with an asynchronous job that sweeps and optimizes afterwards, but extra work to be done.

Regarding the size, I can confirm I got size reductions between 35% and 50%, sometimes a bit more. Source images were all JPGs with quality between 85% and 95% and sizes between 150x150 and 1900x1280. Overall, the images "block" went from 290 MB to 110 MB. Some of the images probably were not optimally compressed as when I used Windows I had a PowerShell script doing the resize instead of ImageMagick, but still it is a huge win, and I cannot even imagine how well it will reduce images of blogs where non-tech people upload images sometimes as they come without any quality or resolution resize.

I ran the big re-encoding of full folders at 85% quality, with the following exact run params (available here too):

find . -type f -name "*.jpg" -exec guetzli --quality 85 {} {}.jpeg \;
find . -type f -name "*.jpg" -exec rm {} +
find . -type f -name "*.jpeg" | rename "s/\.jpeg$//"

As I was going through hundreds of images coming from years of different sources (phones, cameras, image manipulation applications...), I had a small ratio of failures around 1.7% (if I recall correctly the number of failed images) of almost 2900 JPGs. Guetzli as a commnad-line application is really basic, and the verbosity is too scarce to be of use in batching (if you care about errors, that is), so I decided to add some dumb fprintfs and output failed files, so with some replacements and regular expressions I could easily grab all failed images. My fork can be found here.

And really that's all I had to say regarding it. Clearly the opensourced tool is not the same stuff Google uses, as there is a single test and I refuse to believe they would use that command line tool in production, but the algorithm is there, is as awesome as advertised and I'll definetly optimize my blog post images from now on with it.


Book Review: Thinking Fast and Slow

Finishing this book took me some time, because I've heard it in audio-book format and commuting to my previous job was noisy and distracting to focus on hearing, but with my recent change I now can perfectly listen to voice audios.

Review

Thinking Fast and Slow book cover

Title: Thinking Fast and Slow

Author: Daniel Kahneman

How we think is managed by two systems/fragments/"selfs" of the brain: One is impulsive and quick-thinking but prone to biases and errors in judgement, the other is slower, recalls memories and experience, but is "lazy" and tends to delegate to the "System one". Along this book we're taught how we commit lots of mistakes, misjudgements, wrong decisions, and in general kind of get tricked by ourselves. But also, we learn valuable leasons to become better at handling this situations (spoiler: take your time and think).

There are many chapters, focusing on different aspects, fears, biases, mistakes, choice-making... Full of examples, studies and tips to learn to avoid them. They are useful in most areas, from personal life, well-being and relations to business-related decisions.

To mention something not great, there are quite a few military-related remarks, for example examples of loss adversion with "selling missiles", when there are infinite less belicist examples. I understand that for the author (Hebrew and with a military past) it might be normal, but for me it strikes as a sad way of exemplification. Also, some chapters are a bit of an artificial separation. For example, gambling-related topics take multiple chapters.

Despite my small critics, this book is a must read. The best way to combat biases is to know them, and with so many examples becomes clear we're a bit flawed and should not rush on our decision taking actions.

Notes

Sadly I didn't noted everything as sometimes is hard to take notes and I cannot highlight as with Kindle books, but I think I wrote most of the important topics (for me at least).

  • Two systems that drive the way we think. System 1 is fast, intuitive, and emotional; System 2 is slower, deliberative, logical
  • Most important mistake our brain takes as the common truth: what you see is all there is
  • Law of small numbers
  • Law of averages: belief that the statistical distribution of outcomes among members of a small sample must reflect the distribution of outcomes across the population as a whole
  • Most of what happens in life is random (most facts in the world are random). causal explanations are dangerous and many times wrong
  • Anchoring effect
  • Availability heuristic
  • Statistics' [base rate])https://en.wikipedia.org/wiki/Base_rate_
  • Conjunction fallacy: The probability that two events will both occur can never be greater than the probability that each will occur individually
  • success = skill + luck . greater sucess = more skill + lot more luck
  • Regression to the mean
  • Intensity matching
  • When we change our view of the world, we lose or weaken our hability to recall the old view. it is a weakness of our mind
  • Hindsight bias
  • Errors of prediction are inevitable because the world is umpredictable
  • Subjective confidence shouldn't be treated as competence. low confidence can be more confident
  • Intuition adds but after disciplined objective recollection
  • Outcomes are gains and loses (utility of wealth): changes of wealth instead of states of wealth
  • Theory-induced blindness
  • Human brain gives priority to bad news
  • Good relations involve more avoiding bad moments than having good moments
  • Cuantification is much more powerful than mere numbers or percentages. eg. 4 out of 101 ... vs 40% ...
  • Narrow framing vs broad framing
  • Sunk cost fallacy
  • Losses evoke stronger feelings than costs

On Elastic Beanstalk, Docker and CircleCI

I joined ticketea's engineering team last month, and apart from learning how things work and doing some bugfixing weeks (to get comfortable with the code and peek at some of the projects), I also got assigned to one of the new projects. There are three projects that we have started from scratch, allowing us to decide if to keep or change the current platform (which could be more automated). In order to take decisions, we did some research and proofs of concept.

The main goal of the research was to setup a basic AWS Elastic Beanstalk orchestation system, to allow us to perform deploys, local runs, etc. without needing to manually handle EC2 instances and build the corresponding toolset, as we don't have any systems team.

Our results are mixed but still subject to change as we haven't yet discarded or decided for a certain route, we keep exploring multiple paths with the projects to decide later. Despite that, I'll leave here some notes and references. Don't expect great notes as this is more of a cleanup of a worklog/checklist (actually, it was a simple Github issue).


CircleCI

We'll stick with CircleCI as our test runner, builder and probably continuous deployment tool for staging. Version 2.0 works nicely with containers and, despite being heavily modified from v1.0, modifications were quick to perform.

Elastic Beanstalk

EB has been relegated to staging/production deployment. For that, the cluster features (load balancing, rolling deploys, etcetera) are great and very easy to use. Instead, for local development it is between painful and directly impossible without hacks to work decently. The reasons are multiple, primarly being:

  • You cannot use docker-compose as EB internally uses it and forces you to use their YML config files or rely on fully manual Makefiles + raw Docker
  • eb local works only on pretty much factory-default scenarios. As soon as you start working on real services, it just doesn't works
  • EB works using environments, but it is configured so one "folder" is the equivalent to one environment. So having dev, staging, production etc. means one of the two following hacks:
    • Have a single root dockerrun.aws.json with placeholder variables that you replace by the appropiate enviroment values
    • Have multiple dockerrun.aws.json at subfolders (one per environment) and move them via Makefile or similar to the root depending on where you run it
  • We've become more proficient on using "raw" docker, but in the end we decided to still use docker-compose, even if only for development. It saves you a lot of command line writing and is quick to change.

Resources:

EB Configuration files

Alternatives to EB

One of the teams, after asking for some feedback to friends and colleages is testing Terraform. It looks promising and is working fine for them but also needs maintenance, so there is no firm decision yet regarding if to use it or stick to Elastic Beanstalk and Makefiles (at least for now).

ECS + ECR

We setup a registry and pushed both development and production images after successful builds. It works quite nicely and the only reason we are not using them actively is to try to avoid the permissions hell you enter once you want to share images between different Amazon accounts (not just IAM users on the same account, but fully separate ones).

Redis

We are using Redis for our project, a docker image for development and Elasticache for staging and production.

Tools/extensions to check and add if interesting


Pelican Publisher Script

When kartones.net was a blogging community and not my current personal minimalistic landing page, one of the blogs that my friend Lobo666 and I maintained was Uncut. With the change to BlogEngine.Net it kept working easily, with a combination of a WYSIWYG editor (or Windows Live Writer) and uploading post images via FTP (minor manual step). But when I recently moved everything to static sites, as Pelican not only doesn't provides any editor but forces you to build the site to preview the changes, my friend was quite impeded to keep posting at the blog.

On the other hand, I already had some post-processing scripts, to cleanup some files that were always copied to the output folder (and thus, uploaded to the site) and to do other tiny tasks like duplicating files (I want to maintain backwards compatibility with the original RSS feed addresses of the old blogs). They were ad-hoc, but after showing them to my friend he just asked me "if I could just make those scripts also upload automatically the modified files". And indeed, making some changes to pass by command line optionally some post identifier (I decided to use the slug) would help. As would too ease things just removing all the "full indexed pages" that Pelican builds (index<zero-to-almost-infinite>.html pages), and just leaving 10 pages and a link to the full archives page:

Blog paging screenshot

This way, and removing the tags, categories and authors subfolders as I don't use them, the number of modified files to upload on a mere new blog post action is around a dozen, making it blazing fast to "deploy" with some Python code. In the end, generalizing the script for the three blogs that I still write and/or maintain, by specifying a few configuration parameters you can specify folders to create or delete, files to copy, remove, duplicate, truncating the index files... and of course upload a post or just build without uploading.

I don't want to extend myself much more as the utility of this tool is limited and very specific, getting to the point, I uploaded the script files to my Python assorted GitHub repo. The direct url of the publisher files is: https://github.com/Kartones/PythonAssorted/tree/master/pelican/publisher.

Usage is quite simple:

python3 publisher.py your-great-post-slug

And to only build:

python3 publisher.py

And that's all. Until next time :)


Previous entries