Kartones Blog

Be the change you want to see in this world

An interesting Evolutionary Design talk

I'm finally getting up to date with a non-trivial backlog of pending talks to watch, and today I watched one that resonated so much with me that I wanted to a) express how interesting it is and b) keep some notes about it in the blog, because it is so nicely explained.

If you like concepts like Test-Driven Development, the old Extreme Programming, or simply like to do true lean development and real iterative software building, I'm confident you will enjoy it. But I encourage everybody to watch it, even if you don't fully share the principles [1].

The talk is Evolutionary Design Animated, by James Shore, presented at YOW! 2019. Consists of two parts, each around 25 minutes: Part I, Part II

Notes:

Evolutionary design:

  1. Simple Design
  2. Continuous Design
  3. Reflective Design

All of them enabled by fast & reliable automated tests

Simple Design

  • Start with a walking skeleton
  • Do the simplest thing that could possibly work
  • You Aren't Gonna Need It
  • Simple, not sloppy

Rules for simple design:

"When, not if, I need to change this decision in the future, how hard will it be?"

  • Every concept once
  • ...And only once (don't repeat yourself)
  • Design intent clear and obvious
  • Concrete, not speculative
  • Cohesive: code that changes together, stays together
  • Decoupled: if it's out of sight, it's safely out of mind
  • Isolated: if it's widely used, it's abstracted by an interface

Continuous Design

  • Constantly review and improve the design
  • Merciless refactoring
  • Collective ownership
  • Pairing and/or mobbing
  • Continuous Integration
  • Camp site rule: Don't make it perfect; just make it better

Reflective Design

  • Review the code you're about to work on
  • Identify flaws ("code smells", difficulty understanding)
  • Reverse engineer design of code, if necessary
  • Imagine how to improve the design of the code
  • Incrementally refactor the code to reach desired design


[1] I disagree with the speaker's point of view regarding using existing open-source solutions vs building your own, as a DIY approach also can carry heavy maintenance burdens, and not all open-source software has high maintenance costs.


Perfect is the enemy of good

People often disregard a technology, language, or service just because it has one or two flaws, usually followed by either "X is better" or "they should have done it better". To me, it comes down to two angles.

Everything has trade-offs

It is easy to find counter-arguments if you know about the alternative proposed, because the truth is that there is no perfect technology, no flawless language, and no universal service. Read, learn, and use enough, and you will always find details that could be improved everywhere.

But let's see three trendy examples:

As of 2023, everyone uses Git to store their source code, except maybe Facebook and their highly adapted Mercurial fork (or perhaps they have also switched by now?). But let's be sincere, Git is as powerful as messy: From time to time, you'll find an article or two mentioning some inconsistency or some limitation (sample). Where it really fails to deliver is in the ease of use: There are so many commands with similar or opposite actions but totally different names, flags and/or parameters, that no wonder why there are so many books and cheatsheets online to learn about the tool. Mercurial, instead, was simpler, consistent, and even a smooth evolution from its predecessor Subversion.

Something more common than Git is Javascript, according to some now the most used language in the world. More than two decades old, and while vastly improved since it originally appeared, it is still a flawed language, as an everyday developer will likely notice, in aspects ranging from Array.includes vs Element.classlist.contains, to the long-lasting battle of CommonJS vs ES Modules, or its performance and memory consumption problems. Not to mention an ever increasingly tricky tooling, or attempts to fix the language (Coffeescript, Dart, Typescript...), or projects to take it outside of the web confines (NodeJS, now also Deno [1]).

Lastly, the transition from XML to JSON is fully completed, and if you attempt to use the former, you'll get raised browns and a lot of questioning. And yet, JSON as a data transport format is nonoptimal! From poor streaming capabilities, to a minimalistic types set (doesn't even know about timestamps!), or almost no mechanisms for data validation (thankfully projects like JSON Schema exist), it is almost as if its only advantage versus XML was extreme simplicity. And it probably is, as now most service communications and APIs use it as the de-facto messaging format despite all the limitations.

So, was Git better than Mercurial? Javascript better than Java applets? JSON better than XML? VHS better than Betamax? Yes, and no. In some sense they are inferior, but they won and conquered their territories, with their flaws and mistakes, and keep evolving.

Seeking perfection is wrong

Seeking perfection can blind you. A perfectionist can spend too much time on small details; meanwhile, a pragmatist will advance way more and way faster. I used to try to be a perfectionist, and I would never finish any personal project. Now I plan small incremental iterations, and feel way happier to have quite a few things, and scripts, and small web projects, here and there; most if not all in a decent state to fulfil their original purpose.

And at your job, if you are an engineer, often you are not paid to craft the single best and most flawless piece of software; You get paid to generate value, to the business and often to its users. If you get pride boosts along the way because you do something awesome, great and congratulations! But it would be best if you prioritize delivering new or better things, not perfect things. Note that it does not mean "shipping fast" if doing so would hurt quality (or waste resources needlessly), but that's a topic maybe for another post.

I've seen a few times teams wasting time building "their better XXXXX" instead of using the stock alternative (often open-source). Or even better, use that open-source system and try to add improvements and contribute back some of them (It's not always possible, though). If you include the time factor, I bet you could often steer away projects from a "Do It Yourself" approach to a "let's use what's available, as long as it is good enough".


Don't let perfect be the enemy of good


[1]: Fun fact: Deno is, to my knowledge, from one of the creators of NodeJS, and mainly originated to fix many of NodeJS mistakes, which apparently are pretty hard to solve nowadays due to its decision-making system. I found this podcast episode quite interesting to learn about the whys.


CURL, Git, and more cheatsheets

I just finished moving some of my GitHub gists here, as pages accesible from the Archives, mostly some cheatsheets about:

  • CURL: pretty basic, I should spend some time adding more non-trivial examples
  • Git: I'm actively improving this one (as I'm currently levelling up my git skills)
  • Image & Video: Mostly some ImageMagick, FFMPEG and WebP examples

For now I'll leave my PostgreSQL cheatsheet as a gist, because it seems to be quite popular there.


I don't care much about SEO

My posts are shorter because I like to go directly to the point. If I give some context, it is because I feel it adds something, else I will skip it. I don't count how many words I've written, and I'll never add some filling content to try to cover additional topics and be more appealing for SEO reasons. And I hate when I detect one of those artificially inflated articles, often skipping them even if there is some relevant information inside.

But not caring about SEO does not imply not applying good practices. I've seen now decades of slow but steady URL tweaks and changes, and I happily embraced ideas like removing format extensions or post dates; I feel sad when I click on a link and 404s, so I pass from time to time a little CLI crawler that checks for broken links in my sites (and I then manually go and fix all of them, often redirecting to archive.org); I try to describe things with my own words and provide my own code examples, but when needed I always cite the source of an excerpt. And when I think a post needs a minor update to reflect some relevant discovery, I go and check if a simple update is ok or if it's best to write a new article and reference the old one.

I think Search Engine Optimization is something that everybody in web development should learn a bit, but just "the good parts". I don't feel like almost blindly fighting against legions of skilled engineers building search engines. I'd instead produce engaging content on my own terms and rules, and let Google, Bing and others do whatever they want with it, as in ignoring or indexing, and how well/bad they rank it.

But you might think: why this sudden rant about the topic? The answer is the recent Yandex source code leak. Not judging the morality of the action itself, I read an article or two analyzing the source code (my favourite is the searchengineland.com analysis), because, from an engineering point of view, this is a unique opportunity: An inside view of how a vast and mature search engine works; a peek of how and where so many brilliant minds have put a lot of effort. The Holy Grail might still be Google, but it is still fascinating and revealing. I wish search engines were less opaque with their algorithms, but I also understand that if you reveal the whole formula, bad actors will abuse the system.

As I said, I give little attention to SEO, but I encourage you to read about the leak.


Gazelle (Bazel): Loading other BUILD files

While I'm still a newbie regarding Bazel, one of the main caveats of the system is that still lacks documentation about many topics, or at least I find myself ending up digging into the source code to learn how things work, due to no better alternatives. So I'll try to write from time to time about my findings with Bazel and its tooling. As in this case, with the official BUILD file generator, Gazelle.

The Context

Gazelle traverses project folders and generates rules. Both actions are done in depth-first post-order. On the other hand, setting up the configuration is done from the root to the leaves: First you calculate the root configuration, then gets propagated down (inherited by children nodes), and potentially modified via Gazelle directives (configuration in the form of special annotations).

What the previous brief summary means, among other things, is that when working with Gazelle extensions you can kind of guess at which point a certain BUILD file is when being traversed by Gazelle, but you should in general be very careful to not rely on that history. But at the same time, you can rest assured that, no matter which GenerateRules call you are in, your Config will always have already passed through at minimum the root node's Configure step.

The Problem

Recently, I was in a situation where I wanted to read certain rules from a BUILD file, from other places. An example:

/folder-a/BUILD
/folder-b/BUILD   <- our target
/folder-c/BUILD
/folder-d/subfolder-i/BUILD
BUILD
WORKSPACE

"I want to use the rules from folder-b/BUILD from folder-a, folder-c, folder-d/subfolder-i, and the like"

And I knew the following:

  • That specific BUILD file is in a known path, and won't move
  • That specific BUILD file is manually maintained
  • That specific BUILD file is not on the root directory

So we shouldn't rely on Gazelle's tree walker because depending on where we want to use those rules maybe the file might not be yet read... Or maybe you calculate the traversal path and assume that as of today it will be read before, but tomorrow Gazelle folks implement a multi-threaded parallel traversal and then everything breaks again...

All of this sums up to the fact that, despite being the most common vessel for state transfer, storing the rules from that specific BUILD file in the Config is not a viable approach.

The Solution

As we know exactly where the file resides, and we know that it will always exist, one approach that we can take is to read and parse the file. And Gazelle already knows how to read BUILD files and parse them as ASTs, so it we can have some code like the following:

import (
  // ...
  "github.com/bazelbuild/bazel-gazelle/rule"
)

// ...

loadFolderBRules := func() {
  // `c` is the "master" `Config`, only available at a few methods
  filePath := path.Join(c.RepoRoot, "folder-b", "BUILD")
  fileContent, fileErr := os.ReadFile(filePath)
  // fileErr error handling should go here

  targetData, dataError := rule.LoadData(filePath, "", []byte(fileContent))
  // dataErr error handling should go here

  // For example, let's go through the rules
  for _, rule := range targetData.Rules {
      // Now we can store relevant info from the rule
      // `config` is the config passed to this specific node
      // You should also change your extension's logic to store `myProperty`
      //  and propagate it to its children
      config.myProperty[rule.Name()] = rule.Kind()
  }
}

In the example we can see that we obtain a nice File struct named targetData, populated with the parsed contents of the BUILD file.

Now if we place the previous loadFolderBRules method inside the Configure implementation:

loadFolderBRules := func() {
  // previous code here
}

// Do not place the `rel` check inside `f`
//  in case for some reason there is no root BUILD file
if f != nil {
  // ...
}

// This will happen before any `GenerateRules` call, 
//  because `rel` will equal `""` when traversing the root node
if rel == "" {
  loadFolderBRules()
}

We have loaded that special BUILD file from folder-b only once, stored our desired data in the configuration, and that configuration will be automatically propagated to all the descendants. To use it, you just need to access the configuration parameter that GenerateRules will receive, and use the myProperty map.

Remarks

The rel == "" trick might not be the cleanest approach, but it is the best I could come up with, as Gazelle on its current version lacks any kind of extension hook for when it begins to work.

Storing the data in the node config might also sound like a lot of unnecessary copying, but as mentioned in one of the comments, the "master config" is only available to a few methods (Configure is one of them). It would be a better destination for data that you read once and never mutate, but as of now can't be accessed from GenerateRules, so can't use it.


Previous entries