Gazelle: Workspace traversal and main extension handlers

Extension Handlers

Extensions in gazelle have multiple entry points, but exactly when they are called it is barely documented (mere docstrings in the handlers). While there are more, I am going to focus on the

Configure(): Reads, inherits and/or updates the configuration for each package (folder). This is where Gazelle Directives are read from a BUILD file, if any.
GenerateRules(): The place where you will place all the logic of generating rules and imports (because each generated rule needs a companion "imports list", even if the latter is empty). This usually means reading source files. This is probably the place where, most often, changes and additions will happen. [1]
Imports(): Should return a list of ImportSpecs that define when to import a rule. For example, in the case of JavaScript, assuming a module, all the files in the module plus optionally the barrel file if present.
Resolve(): Converts imports into Bazel dependency labels. Your magic of transforming JavaScript import statements into Bazel deps entries goes here.

So, how does Gazelle execute those handlers? Well, the logic is not terribly complex, but in my experience is way more clear via some diagrams, so first let's see an overview:

Gazelle: Order of extension handlers

For any and every package, Gazelle will first invoke Configure, then GenerateRules & Imports; and when there are no more packages, the resolution phase happens and will call Resolve for each import found. But things are not that easy, as the order in which packages are "configured" is different from the order in which their rules & imports are generated.

Workspace Traversal Order

Bazel builds an action graph, where the parent of a node represents a dependency. If we stop and think about it, what this means is that if A depends on B, I cannot build A until I've built B; Similarly, Gazelle cannot guarantee that it is able to generate any rule from package A until rules from package B have been generated (because if A depends on B, then there's at minimum one target in B that will be a dependency in A).

At the same time, as configuration starts in the root node, and it is inherited and potentially modified by any child node, this means that by the time we want to generate rules for leaf nodes, we must have computed all the configuration chain up until the root node.

In practice, this means that gazelle will crawl your project running at different steps, keeping all data in memory until it can execute the generation (and then, write the new BUILD files in their folders).

But how does it traverse the folders? Like this:

It begins in the project root, thus generating the root-most configuration (with the extension defaults + any potential root BUILD file directives)
It begins to crawl subfolders, in depth-first post-order
If the folder wasn't yet configured, calls Configure() (it will always traverse first parents than their descendants)
If the folder contains subfolders, keeps going
When it reaches a leaf node/package (folder without subfolders), it calls GenerateRules() and Imports() on it
It will backtrack, either generating the parent node's rules, or traversing to the next children subfolder

Except for the first point, the remaining ones will continue until the whole project/workspace has been visited. At that point, all the rules for the language that the extension provides will be generated (but remember, their imports not yet resolved to Bazel labels).

In the following animated diagram, we can see how the extension runs the Configure handler folder by folder:

Gazelle: Workspace traversal of Configure function

Text log version:

Configure()     : 
Configure()     : a
Configure()     : a/c
Configure()     : a/c/f
Configure()     : a/d
Configure()     : a/d/g
Configure()     : a/d/h
Configure()     : b
Configure()     : b/e

And in this other animated diagram, we can see how the depth-first post-order traversal generates rules and imports, starting with leaf nodes up until the root:

Gazelle: Workspace traversal of GenerateRules and Imports

Log of the GenerateRules calls:

GenerateRules() : a/c/f
GenerateRules() : a/c
GenerateRules() : a/d/g
GenerateRules() : a/d/h
GenerateRules() : a/d
GenerateRules() : a
GenerateRules() : b/e
GenerateRules() : b
GenerateRules() :

And, for completion, the log of Imports calls:

Imports()       : a/c/f/file
Imports()       : a/c/file
Imports()       : a/d/g/file
Imports()       : a/d/h/file
Imports()       : a/d/file
Imports()       : a/file
Imports()       : b/e/file
Imports()       : b/file

Note that there is no root Imports() call, because the BUILD file in my example does not generate any rule.

You can try for yourself, and test moving things round if you want, as I implemented this exact example in this PR (the logs come from running Verbose mode activated in a test). Also, here is the combined log of all the entry points (sorry, no diagram):

Configure()     :
Configure()     : a
Configure()     : a/c
Configure()     : a/c/f
GenerateRules() : a/c/f
Imports()       : a/c/f/file
GenerateRules() : a/c
Imports()       : a/c/file
Configure()     : a/d
Configure()     : a/d/g
GenerateRules() : a/d/g
Imports()       : a/d/g/file
Configure()     : a/d/h
GenerateRules() : a/d/h
Imports()       : a/d/h/file
GenerateRules() : a/d
Imports()       : a/d/file
GenerateRules() : a
Imports()       : a/file
Configure()     : b
Configure()     : b/e
GenerateRules() : b/e
Imports()       : b/e/file
GenerateRules() : b
Imports()       : b/file 
GenerateRules() : 
Resolve()       : a/c/f/file
Resolve()       : a/c/file
Resolve()       : a/d/g/file
Resolve()       : a/d/h/file
Resolve()       : a/d/file
Resolve()       : a/file
Resolve()       : b/e/file
Resolve()       : b/file

So there you have it, a hopefully simple enough explanation of some of the internals of Gazelle. I plan to cover in the future other obscure technical details and features, and most of the time (but I can't promise that it will be always), I'll try to provide an example via my bazel-gazelle-sample-web-extension sample repository.

[1] That's why I decided to implement a very simplified version of the Chain-of-responsibility pattern in my sample web extension (relevant highlights: I & II), to try to maintain the code easily testable and expandable even if it were to contain many rules.

Tags: Bazel Development Gazelle Go Tools

Gazelle: Workspace traversal and main extension handlers article, written by Kartones

. Publication date: 2024-04-21