Kartones Blog

Be the change you want to see in this world

Gazelle (Bazel): Loading other BUILD files

While I'm still a newbie regarding Bazel, one of the main caveats of the system is that still lacks documentation about many topics, or at least I find myself ending up digging into the source code to learn how things work, due to no better alternatives. So I'll try to write from time to time about my findings with Bazel and its tooling. As in this case, with the official BUILD file generator, Gazelle.

The Context

Gazelle traverses project folders and generates rules. Both actions are done in depth-first post-order. On the other hand, setting up the configuration is done from the root to the leaves: First you calculate the root configuration, then gets propagated down (inherited by children nodes), and potentially modified via Gazelle directives (configuration in the form of special annotations).

What the previous brief summary means, among other things, is that when working with Gazelle extensions you can kind of guess at which point a certain BUILD file is when being traversed by Gazelle, but you should in general be very careful to not rely on that history. But at the same time, you can rest assured that, no matter which GenerateRules call you are in, your Config will always have already passed through at minimum the root node's Configure step.

The Problem

Recently, I was in a situation where I wanted to read certain rules from a BUILD file, from other places. An example:

/folder-a/BUILD
/folder-b/BUILD   <- our target
/folder-c/BUILD
/folder-d/subfolder-i/BUILD
BUILD
WORKSPACE

"I want to use the rules from folder-b/BUILD from folder-a, folder-c, folder-d/subfolder-i, and the like"

And I knew the following:

  • That specific BUILD file is in a known path, and won't move
  • That specific BUILD file is manually maintained
  • That specific BUILD file is not on the root directory

So we shouldn't rely on Gazelle's tree walker because depending on where we want to use those rules maybe the file might not be yet read... Or maybe you calculate the traversal path and assume that as of today it will be read before, but tomorrow Gazelle folks implement a multi-threaded parallel traversal and then everything breaks again...

All of this sums up to the fact that, despite being the most common vessel for state transfer, storing the rules from that specific BUILD file in the Config is not a viable approach.

The Solution

As we know exactly where the file resides, and we know that it will always exist, one approach that we can take is to read and parse the file. And Gazelle already knows how to read BUILD files and parse them as ASTs, so it we can have some code like the following:

import (
  // ...
  "github.com/bazelbuild/bazel-gazelle/rule"
)

// ...

loadFolderBRules := func() {
  // `c` is the "master" `Config`, only available at a few methods
  filePath := path.Join(c.RepoRoot, "folder-b", "BUILD")
  fileContent, fileErr := os.ReadFile(filePath)
  // fileErr error handling should go here

  targetData, dataError := rule.LoadData(filePath, "", []byte(fileContent))
  // dataErr error handling should go here

  // For example, let's go through the rules
  for _, rule := range targetData.Rules {
      // Now we can store relevant info from the rule
      // `config` is the config passed to this specific node
      // You should also change your extension's logic to store `myProperty`
      //  and propagate it to its children
      config.myProperty[rule.Name()] = rule.Kind()
  }
}

In the example we can see that we obtain a nice File struct named targetData, populated with the parsed contents of the BUILD file.

Now if we place the previous loadFolderBRules method inside the Configure implementation:

loadFolderBRules := func() {
  // previous code here
}

// Do not place the `rel` check inside `f`
//  in case for some reason there is no root BUILD file
if f != nil {
  // ...
}

// This will happen before any `GenerateRules` call, 
//  because `rel` will equal `""` when traversing the root node
if rel == "" {
  loadFolderBRules()
}

We have loaded that special BUILD file from folder-b only once, stored our desired data in the configuration, and that configuration will be automatically propagated to all the descendants. To use it, you just need to access the configuration parameter that GenerateRules will receive, and use the myProperty map.

Remarks

The rel == "" trick might not be the cleanest approach, but it is the best I could come up with, as Gazelle on its current version lacks any kind of extension hook for when it begins to work.

Storing the data in the node config might also sound like a lot of unnecessary copying, but as mentioned in one of the comments, the "master config" is only available to a few methods (Configure is one of them). It would be a better destination for data that you read once and never mutate, but as of now can't be accessed from GenerateRules, so can't use it.


Emulation, Virtualization & Compatibility Layers

I'm fascinated with the topic of emulation as a way to preserve old things and be able to still use them in the present. And sometimes, it feels kind of like doing digital archaeology. I've used it at work, to play old videogames, to sign documents with a Windows only Java application (an amazing feat to achieve!), and even to keep using my scanner as it wasn't compatible with Linux.

I consider that there are four broad categories of emulating or replicating something (really more if you include "simulation"), and I thought it'd be nice to write a summary, and leave some links with additional details.

Emulator

Let's begin with the simplest and probably most well known term, emulation. Quoting Wikipedia, "an emulator is hardware or software that enables one computer system (called the host) to behave like another computer system (called the guest)". Whenever you're emulating old hardware (e.g. an old 8086), a videogame console (e.g. the great GameBoy), or an API that emulates a service, they always try tp reproduce in as much detail as possible the source system.

The main goal with emulation is accuracy, but depending on if an emulator aims for high or low level emulation, this might not be totally true and we might get into compatibility layers (explained later). Note also that unless an emulator is complete, can be missing sub-systems or certain fragments.

If you want a deeper but still introductory dive into the topic, I recommend reading the article from Retro Reversing on how emulators work.

Virtual Machine

A virtual machine or VM is a system (commonly, but not only, software) that implements the capacity of running a certain computer machine (guest) inside another (host). While an emulator can be of a certain piece (e.g. a file system emulator), a Virtual Machine always represents a complete machine. Also, a VM does not needs to be a real machine. For example, Another World is an old videogame whose creator decided to implement a VM that generated bytecode, making easier to port the title to many different systems by having that common layer (more details, fascinating reading!).

Virtual machines are very mature, have evolved a lot. Today, existing concepts like Hypervisors, kernel shared memory and proprietary technologies like Intel VT, all allow for very efficient managing of multiple VMs running on a single physical machine. Still, since quite some time you could already perfectly do your daily work on a VM. For example, I did it long ago with Microsoft Virtual PC. And let's be honest, compared with the difficulty of setting up containers for non-trivial scenarios, it's still one of the best options.

OS-Level Virtualization

Operating System Level Virtualization became known after Docker brought to the masses the concept of containers, but the concept of resource groups and restrictions existed before. It basically consists on providing mechanisms to isolate resources, so each container has restricted access to them, but you can then have multiple containers running at once, to better utilize all the available hardware. It is not as safe as virtual machines (where you can fully control almost all boundaries) nor emulators (where you can decide to not emulate or to provide means to disable certain features), but it is very lightweight.

Note that when using containers, if the guest containers is a different operating system than the host one (e.g. Windows or MacOS running Linux containers), it initially wouldn't be able to do so because there is no emulation involved. So in practice, what platforms like Docker do is actually boot up a small Virtual Machine with access to all your configured resources, and then run containers against it. There is a small a sometimes noticeable performance penalty, but it is still faster and more convenient than running a full virtualized OS via a VM.

Compatibility Layer

Last comes the least known category. Despite having used Wine for a while, I never stopped to understand how it worked if it wasn't an emulator. That's when I learned about software compatibility layers, which provide mainly two things: a runtime environment to translate calls from the source system to the destination system, and a set of reimplemented libraries that keep the original interface while adapting the implementation to the new host.

Keeping with the Wine example, it doesn't emulates nor virtualizes Windows, instead providing a runtime that converts Windows API calls into POSIX calls, and re-implements libraries like DirectX or the Windows filesystem. It is mind-blowing that you really don't need to change that much when compared with emulation 🤯.

A compatibility layer's goal is, as the name implies, compatibility. It might not work or look exactly the same as the original system, but as long as it works in the destination, it serves its purpose.

Note that there are also hardware compatibility layers, but that seems to be related with hardware emulation and I have almost no knowledge of them.


Dependency Injection in Javascript and Testing

The conventional way of writing code in Javascript and many other languages that offer easy library patching/mocking (like Python) is to just import the module, and then invoke directly from all functions. Let's see it in practice with a trivial example of writing a file with NodeJS:

import fs from "fs";

export const write = (content) => {
  fs.writeFileSync("test.txt", content);
};


// usage
// -----
import { write } from "somewhere.mjs";
write("something something");

This is easy, quickly to code and conveniently ready to export.

If we want to test the behaviour with Jest, a simple jest.mock("fs"); sets everything up.


Now let's do the same with the simplest form of dependency injection:

export const write = (ioModule, content) => {
  ioModule.writeFileSync("test.txt", content);
};


// usage
// -----
import fs from "fs";
import { write } from "somewhere.mjs";

write(fs, "something something");

With this implementation we need to be quite explicit about the module we're using for I/O, which makes testing trivial and you no longer need Jest's mocking capabilities. But it is true that we need an extra import potentially at many places. While there are more techniques, let's refactor the code to provide an exported function injecting fs, and applying the testables named export pattern provide a way to test everything:

import fs from "fs";

let ioModule = fs;

const writeDI = (ioModule, content) => {
  ioModule.writeFileSync("test.txt", content);
};

const setIOModule = (newIOModule) => {
  ioModule = newIOModule;
};

export const write = (content) => {
  return writeDI(ioModule, content);
};

export const testables = {
  writeDI,
  setIOModule,
};


// usage (normal code)
// -------------------
import fs from "fs";
import { write } from "somewhere.mjs";

write("something else");


// usage (tests)
// -------------------
import fs from "fs";
import { testables, write } from "somewhere.mjs";

const ioModuleMock = {
  writeFileSync: (filename, content) => {
    console.log("ioModuleMock.writeFileSync():", filename, content);
  },
};

// to test the `writeDI` method:
testables.writeDI(ioModuleMock, "mocked something");

// to test the `write` method, and for tests where we don't want real I/O:
testables.setIOModule(ioModuleMock);
// until changed again, everything will use the mock from now on
write("mocked something else");

Cool, so there we have something as usable as the classic implementation, while being able to manually mock it without frameworks. And if we want semi-complex scenarios, we can plug some in-memory implementation like memfs still without having to patch modules.


At this point, you might be wondering why all these changes when Jest does the same with a single line. The answer is speed.

Nothing is free. Jest is quite complete, covers many complex scenarios and is highly configurable. But as all supercharged frameworks it is opinionated, and the complexity needs to be "paid off" somewhere, so you get all this nice features at the expense of using Jest the way Jest wants to be used. Meaning:

  • Module mocking features needs some bootstrapping
  • Jest tries to be smart regarding test discovery

Previous points are compensated by letting Jest be runner/handler for all your tests, because it both has a cache to do test avoidance, and you pay the mock bootstrapping cost just once.

But what happens if you want to have isolated and hermetic tests, with potentially individual, per test runs? What if you want to use an external system to decide which tests to run, instead of Jest deciding for you? Then, the framework gets in the way, because all the extra features become a burden, and you either can't drop them or even after dropping many still weights a lot.

For normal projects a 1 or 2 seconds bootstrap might not sound too much, but at scale and with a big project, you simply can't have thousands of test files each requiring 2s to boot up (plus whatever they take to run).

So reading and exploring ways and alternatives, one question I wanted to reflect upon is: "can it be done without frameworks?".

After all, building your classes and modules with dependency injection in mind, and using existing NodeJS asserts or the new v18.13.0 native mocks, you can go a long way in many cases. And then, for those complex tests where the framework is a clear advantage (or directly a requirement), then do use Jest and similar solutions. It's simply about not making the framework the baseline.

A potential issue you might get into is building your own tiny testing framework with assert helpers and whatnot. I've seen it happen with Python and unittest, instead of using Pytest. I don't think this is bad if you keep it small and simple. And I bet your it will still be faster compared with a big, opinionated framework.

Sidenote: I've focused mostly on Javascript here, but in the past I saw the same scenario with Ruby on Rails: rspec tests using Rails for a non-trivial project would take a wooping amount of ~30 seconds of initial bootstrapping even to run a trivial unit test (RoR preloads all objects, at least did back around 2015). So an engineer thought about using minitest and migrate all tests, because it was orders of magnitude faster. It didn't worked because of other unrelated reasons, but the intention was good.


Always use linters and auto-formatters

Let's begin with a small story: At a previous job (a big company with a far from trivial amount of engineers), there were quite a few services written in Python. Some of them were either moderate or big in size, and most importantly, some had existed since quite a few years. While the code was well structured, with a few comments here and there, and in general easy to read, you could notice that it was not homogeneous: single quotes here and double quotes there, tightly packed methods here triple line break separated methods there, a few classes with UpperCamelCase method names out of place in a Python codebase (maybe they were written by a Java developer?) ...

Those where some symptoms, and to be honest you can live with them. What really began to worry me a bit more as time passed by and I saw happening multiple times, was that I would see pull requests stopped by engineers (often in a different timezone) because of "critical" reasons like:

  • "imports are not alphabetically sorted"
  • "you must leave two new lines between imports and the class name"
  • "comment needs to be triple-double quotes because it's multi-line"

We love to talk about "getting into the flow", about increasing developer productivity, and yet we ruthlessly impede others, sometimes until the next day, because of such trivialities. Most cases the code was otherwise perfect.

But, the real problem wasn't bike shedding, the real issue was that nobody had stopped and though "why are we doing this manual process over and over?", "why are we spending neurons on the form and not on the content of the code?". If you look at the examples above, they already feel like written by an automaton.

Thankfully, we have since quite some time both linters and formatters for most languages. Combined, you'll get consistent and uniform code, and people will focus on the important points of code reviews. Oh, and many linters also warn you about unused variables, methods or imports, so they are also nice for housekeeping.

In the case of Python I already wrote about black, flake8 and isort, so check that out if interested.

In the case of the Javascript ecosystem, while sometimes it feels to me overcomplicated or a bit astray, one thing that it does really well is to have a long history of using linters, even as CI pipeline steps. So it felt good to come last year to a big JS/TS codebase and see zero lint warnings and a lot of formatting and linting rules. And adding your own ESLint rules is not hard and yet very powerful!

About the work tale at the beginning of this blog post, I got approval to add the Python linter and formatters to the biggest repository as pre-commit hooks, and we ran them over all the existing code. Pulling up a few pull requests that modify more than 40k files total is "interesting": Despite splitting into a few PRs, Github couldn't even render the list of collapsed files 😅. There was some pushback at first, but after a week or two the anger dissipated, and since then there was not a single discussion over formatting. It even helped us decide over fundamental questions like spaces vs tabs or single vs double quotes, as a nice thing about black is that is opinionated and almost non-configurable, so it's either This Way or No Way.


Cyberpunk 2077 New Game+ Hack

Cyberpunk 2077 is, despite its flaws, a great game with an amazing setting. On my first playthrough, I spent more than 70 hours in Night City, and after watching the Edgerunners anime I wanted to go back to it.

Sadly, the game does not feature a New Game+ functionality, kind of common in RPGs and games with progression, and which basically lets you start over the single-player story, keeping your character level, abilities and perks. But I'm stubborn so I decided to check for some tools and kind of hack my way somehow.

I found a solution via the CyberCAT-SimpleGUI tool (direct link to GitHub repo if you do not wish to register at NexusMods): A savegame editor that lets you alter the character's appearance and all experience levels. This allows me to:

  • Clone my original character's appearance, as once you have started a game, you can modify some but not all of the face parts
  • Carry over the character level and street cred
  • Carry over the attribute and perk points count (although not how perks are distributed)
  • Carry over the skill levels

I also noticed a few limitations with this approach:

  • I couldn't migrate my appearance because my old savegame was from a too old version (version 10 not supported is the error I got), but the editor looks solid and the numbers were correct with the new savegame (once manually reproduced the looks)
  • It is very important to note the attribute points count and perk points count totals, because if you modify attributes, perks and skills from the savegame tool, you won't get any extra perk points as if you levelled up one level at a time. You will get the discounts, bonuses and the like, though
  • Remember that a skill level can only be maximum level 20, and always must be lower or equal to the corresponding attribute level. I haven't tried breaking this rule but nasty things might happen

Despite the caveats, the hack works and I am levelling up my remaining XP and skill levels without any problem.


Previous entries