Kartones Blog

Be the change you wanna see in this world

On Python 3, Flake8 and mypy

This is a small post just to write about what we use at work and what I'm starting to use at home too for personal experiments. I thought would be interesting to share as at least two friends showed interest about the topic.

First, we're using Python 3.6. I've been using Python for around a year and a half only, and I have almost fully skipped python 2.x, so I am not biased with the "2 is better, don't migrate to 3!" war. I just like a lot all the new features and the way better encoding handling so my ignorance makes me not understand why would people keep with an old and worse version... ¯\(ツ)

Then, we use flake8 as our linter, with every restriction on except the line size rule (which we've upped to a more reasonable 120 characters). But as some people have tendency to drift away from coding standards, to make sure everybody follows it my colleages have setup an integration test that uses flake8.api.legacy to run the checks and make sure there are no violations. It can look silly sometimes, but helps a lot to maintain a uniform codebase.

And finally, again thanks to my colleages we're doing typed Python using mypy. It adds optional type checking both to Python 2 and 3 and provides a linter-check call which reports any error (so can be actually made mandatory). Added to the same battery than the normal linter test, it means all new code must be fully typed or you won't be able to push a build. It is quite robust and you have typing hints for everything, from basic types to optional parameters, callable function handlers, generics and multiple return values (Union type, you still need to specify to it which values are allowed). It stings a bit when you begin using it but after a while it's so nice to forget about weak typing errors (one of my main complaints of scripted languages).

I highly recommend you to watch this PyCon 2017 talk of the creators of mypy to both see it in action and learn about its internals:

All of this combined with some decent tests (kind-of-TDD-without-being-always-strict) means I can do non-trivial changes and refactors without worrying of breaking unexpected things. If you do python, you should try too, the absence of fear makes you feel really nice.

The truth

Oh, and just in case you want to check it out, I keep a small gist where I from time to time write notes and miscellaneous things regarding Python that I learn and wish to keep for the future.

As a tiny sample, I wrote a Python implementation of a double linked list that you can check at my GitHub. It has both flake8 and mypy "linter tests" that check the code for errors or missing typings. Sadly, variable type hinting is only available from Python 3.6 onwards so I've used comment annotations at two places I needed to, as I'm for now using Python 3.5.

And finally, if your IDE is Sublime Text as it is mine, I wrote a post about installing a few linters on the application to directly code clean and best-practices approved Python.

UPDATE #1: Added Youtube talk video.

UPDATE #2: Added my double linked list example.

UPDATE #3: Added link to my post about Sublime Text Python linters.


Recommended Articles - 2017/06/12

Latest bunch of posts, articles and links I found out interesting.

And, while not news, I also wanted to mention (as I'm recommending to my friends) anybody to do the following (free) Coursera course: Learning How to Learn: Powerful mental tools to help you master tough subjects. It is from the University of California and, despite being a 4 weeks course, you can do it more or less in half the time (if you have enough spare time). It not only teaches you how your brain works but really provides good tips to improve your learning lessons. I really enjoyed it and highly recommend it. I'll probably mention other online courses when I finish them.


Recommended Articles - 2017/05/19

Got delayed so more and more news piled up... ending up with a not-so-small list of relevant articles (in my opinion at least!).

  • The Lazy Manifiesto (+ its principles): Sounds like a joke but it has really good points.
  • Deadlines and commitments... the fallacy: Deadlines are often are what I call "FUU": Fictitious, Unrealistic & Unilateral. I didn't knew about Theory X and Theory Y.
  • Incoming Redis streams, which comming from Salvatore smells like really good stuff. Interesting also for custom data format geeks: how the "Listpack" format will work (currently in draft)
  • Learn Redis the hard way (in production): Speaking of Redis... interesting tips
  • [Spanish] Notas From The Trenches 2017: Donosti Edition: Really great notes about speed at a company, knowledge, learning...
  • Electron is flash for the desktop: More or less my opinion. While interesting for some scenarios, in general it is worth to keep the browser doing its thing instead of having 3-4 "capped browsers" running web-apps.
  • SCUMM-8: Amazing use of the PICO-8 virtual console.
  • Scaling Unsplash with a small team: Nice advices:
    • Build boring, obvious solutions
    • Focus on solving user problems, not technology problems
    • Throw money at technical problems
  • @ethanschoonover: "Serverless" still feels to me like a restaurant saying they are "kitchenless" so they can focus on food instead of food preparation.
  • An Illustrated History of iOS: Half educative, half source of fun, it presents a nice summary of how iOS and the "iDevices" got from a beautiful but heavily limited phone to today's fully featured high end smartphone.
  • Machine Learning and Product Managers: Interesting resources to start reading about it. Also, if interested you should check Machine Box, a nice use of Docker to quickly and very easily have ML containers that tag images or recognice faces.
  • 35 programming habits that make your code smell: Some are very obvious, others interesting, in general a nice list.
  • 8 Lines of Code: Talk about hidden complexity and "code magic", about how going simple achieves better results and how often we make things harder by making them complex.
  • How to program independent games - CSUA Speech: Talk by Jonathan Blow, creator of Braid and The Witness, about doing things simple, how being clever causes more problems and focusing instead on solving the problem and even being empathic with other's solutions is good.
  • Hard-won lessons: Five years with Node.js: I don't do (nor fancy much) Node.js, but this advices look interesting and well thought, even more considering how the platform is still young and inmature.
  • How eBooks lost their shine: 'Kindles now look clunky and unhip': Interesting point of view, although as usual I'm out of the averages, as I love my Kindle and I'm actually shrinking my physical books collection (by gifting them to friends and family). Balance and freedom of format choice are the best scenario we can seek.
  • Bias driven development - Mario Fusco: Subjective parts are not of my liking (very opinionated speaker), but all the examples of biases are good ones, and I didn't knew about this two ones:
    • Bandwagon effect: "it worked for my friend" (or in general for others)
    • Law of triviality: aka bike shedding: speak discuss of trivial things at important meetings (like details of the bike shed discussing a nuclear plant).
  • Writing the CFP: Nice compilation of tips and advices to better prepare and give talks.
  • Anonymising images with Go and Machine Box: Small but helpful tutorial, mostly due to Machine Box being so easy to use. I personally have toyed with it (but with the photo tagging box) and is great (and free!).
  • A cybersecurity researcher halts ransomware attack: Was "funny" how in Spain some big companies were sending home all their employees and telling them to shutdown their computers, while this guy actually got to analyze the traffic (and source code?) and found how to stop or at least delay the spreading of the ransomware.
  • Why the "Google Docs" worm was so convincing: Previous malware was a reality check for people who thought patching and updating can wait, this one is indeed a harder one to spot.
  • Basecamp Employee Handbook: Great content and kudos to the transparency!
  • The art of destroying software - Greg Young: I loved this talk and the concept of simplification, detachment from code (what matters is the product, what adds value to the client), of doing small sprints after which you can throw away and rewrite the piece of code if didn't go well or you had to switch priorityes... Really enlightening
  • CQRS - Alistair Cockburn: 5 minutes talk in which you'll learn hexagonal architecture, CQRS, event sourcing... great how nicely explained it is for being so short.
  • Apple’s New Campus: An Exclusive Look Inside the Mothership: I don't fancy much the company but the building looks... WOW
  • How to Live on 24 Hours a Day: Really great advices:
    • "Time is money. [...] Time is a great deal more than money. If you have time, you can obtain money-usually. But you cannot buy yourself a minute more time."
    • "You cannot draw on the future. Impossible to get into debt! You can only waste the passing moment. You cannot waste tomorrow, it is kept from you."
    • "We shall never have more time. We have, and have always had, all the time there is."
    • "You can turn over a new leaf every hour if you choose."
    • "Beware of undertaking too much at the start. Be content with quite a little. Allow for accidents."
    • For starters, we can stop viewing our work as our lives and learn to distinguish the two or intertwine them.
  • @sknthla: "Amazon has reduced its total shipping cost by over 50% since 2006 - unnoticed innovation in logistics". Games Workshop, one of the biggest miniature and boardgame maker, did something similar years ago, switching almost all of their metal miniatures to resin: Lower weight, lower shipping costs.

Encoding JPGs with Google's Guetzli

When I first read about Guetzli I thought that it probably wouldn't be so much, that sounded too good to be true; A 35% size reduction in JPGs, backwards-compatible and with no visible quality loss sounded quite nice. Then, a few days ago we were talking at work about how to optimize our homepage's speed and I reminded about the algorithm, and decided to give it a try over the weekend.

My results can be quickly summarized: It is as great compressing as slow doing it. But recommended if speed is not an issue for you.

Basically, it consumes around 200MB of RAM and takes one minute per source image megapixel. This means that a tiny 150x200 thumbnail takes more than 15 seconds to encode on my laptop, while ImageMagick encodes a FullHD PNG into JPG in around a second. It is indeed extremely slow, making it unfeasible for encoding on the fly blog post images and the like. Nothing that cannot be solved with an asynchronous job that sweeps and optimizes afterwards, but extra work to be done.

Regarding the size, I can confirm I got size reductions between 35% and 50%, sometimes a bit more. Source images were all JPGs with quality between 85% and 95% and sizes between 150x150 and 1900x1280. Overall, the images "block" went from 290 MB to 110 MB. Some of the images probably were not optimally compressed as when I used Windows I had a PowerShell script doing the resize instead of ImageMagick, but still it is a huge win, and I cannot even imagine how well it will reduce images of blogs where non-tech people upload images sometimes as they come without any quality or resolution resize.

I ran the big re-encoding of full folders at 85% quality, with the following exact run params (available here too):

find . -type f -name "*.jpg" -exec guetzli --quality 85 {} {}.jpeg \;
find . -type f -name "*.jpg" -exec rm {} +
find . -type f -name "*.jpeg" | rename "s/\.jpeg$//"

As I was going through hundreds of images coming from years of different sources (phones, cameras, image manipulation applications...), I had a small ratio of failures around 1.7% (if I recall correctly the number of failed images) of almost 2900 JPGs. Guetzli as a commnad-line application is really basic, and the verbosity is too scarce to be of use in batching (if you care about errors, that is), so I decided to add some dumb fprintfs and output failed files, so with some replacements and regular expressions I could easily grab all failed images. My fork can be found here.

And really that's all I had to say regarding it. Clearly the opensourced tool is not the same stuff Google uses, as there is a single test and I refuse to believe they would use that command line tool in production, but the algorithm is there, is as awesome as advertised and I'll definetly optimize my blog post images from now on with it.


Book Review: Thinking Fast and Slow

Finishing this book took me some time, because I've heard it in audio-book format and commuting to my previous job was noisy and distracting to focus on hearing, but with my recent change I now can perfectly listen to voice audios.

Review

Thinking Fast and Slow book cover

Title: Thinking Fast and Slow

Author: Daniel Kahneman

How we think is managed by two systems/fragments/"selfs" of the brain: One is impulsive and quick-thinking but prone to biases and errors in judgement, the other is slower, recalls memories and experience, but is "lazy" and tends to delegate to the "System one". Along this book we're taught how we commit lots of mistakes, misjudgements, wrong decisions, and in general kind of get tricked by ourselves. But also, we learn valuable leasons to become better at handling this situations (spoiler: take your time and think).

There are many chapters, focusing on different aspects, fears, biases, mistakes, choice-making... Full of examples, studies and tips to learn to avoid them. They are useful in most areas, from personal life, well-being and relations to business-related decisions.

To mention something not great, there are quite a few military-related remarks, for example examples of loss adversion with "selling missiles", when there are infinite less belicist examples. I understand that for the author (Hebrew and with a military past) it might be normal, but for me it strikes as a sad way of exemplification. Also, some chapters are a bit of an artificial separation. For example, gambling-related topics take multiple chapters.

Despite my small critics, this book is a must read. The best way to combat biases is to know them, and with so many examples becomes clear we're a bit flawed and should not rush on our decision taking actions.

Notes

Sadly I didn't noted everything as sometimes is hard to take notes and I cannot highlight as with Kindle books, but I think I wrote most of the important topics (for me at least).

  • Two systems that drive the way we think. System 1 is fast, intuitive, and emotional; System 2 is slower, deliberative, logical
  • Most important mistake our brain takes as the common truth: what you see is all there is
  • Law of small numbers
  • Law of averages: belief that the statistical distribution of outcomes among members of a small sample must reflect the distribution of outcomes across the population as a whole
  • Most of what happens in life is random (most facts in the world are random). causal explanations are dangerous and many times wrong
  • Anchoring effect
  • Availability heuristic
  • Statistics' [base rate])https://en.wikipedia.org/wiki/Base_rate_
  • Conjunction fallacy: The probability that two events will both occur can never be greater than the probability that each will occur individually
  • success = skill + luck . greater sucess = more skill + lot more luck
  • Regression to the mean
  • Intensity matching
  • When we change our view of the world, we lose or weaken our hability to recall the old view. it is a weakness of our mind
  • Hindsight bias
  • Errors of prediction are inevitable because the world is umpredictable
  • Subjective confidence shouldn't be treated as competence. low confidence can be more confident
  • Intuition adds but after disciplined objective recollection
  • Outcomes are gains and loses (utility of wealth): changes of wealth instead of states of wealth
  • Theory-induced blindness
  • Human brain gives priority to bad news
  • Good relations involve more avoiding bad moments than having good moments
  • Cuantification is much more powerful than mere numbers or percentages. eg. 4 out of 101 ... vs 40% ...
  • Narrow framing vs broad framing
  • Sunk cost fallacy
  • Losses evoke stronger feelings than costs

Previous entries