One topic, that I incorrectly though was always happening, was that compiling always generated the exact same output (the same bytes). This is not always the case.
I know about compilation modes, optimization flags and the like, so we're not talking here about those basic scenarios. We're talking about cases where running exactly the same flags, in the same environment, and against the same source code (let's say C++), generates a binary that works exactly the same, and yet has a different checksum (and differs in some bytes).
There are two big issues with this lack of output determinism, or as it is better known, lack of reproducible builds:
- Security: How can you ensure that a binary has not been tampered with, if its signature changes?
- Build Caching: You will always get a cache miss if you always get a different binary
My tiny research originated from discussing with a colleague why some Windows Visual C++ builds were not cached, when there were no apparent code changes [1] and the flags were not mutating.
The first thing that he found was that indeed, Microsoft Visual C++ (MSVC) is not deterministic by default. The following article will help you achieve it: https://nikhilism.com/post/2020/windows-deterministic-builds/.
Coincidentally, I read that Golang since 1.21.0 has a perfectly reproducible build toolchain. It was explained in a blog post.
At this point, I decided to check if there were more resources, and found that there is a Reproducible Builds organization, with great general tips on what to look for in compilers, project files and the like, to achieve determinism.
And I was quite surprised to find that Java is also not deterministic by default, but with a twist: the bytecode is, but the jar
files aren't. So here are some articles around that:
- generic reproducibility: https://reproducible-builds.org/docs/jvm/
- with Maven: https://maven.apache.org/guides/mini/guide-reproducible-builds.html
- with Gradle: https://dzone.com/articles/reproducible-builds-in-java
As an extra note, these kind of non-deterministic behaviours can also happen on scripted languages like TypeScript/JavaScript, not when transpiling code, but when doing tree-shaking and/or bundling; See for example the multiple places where you can set flags to deterministic
in Webpack.
[1] Build avoidance is a different topic, so let's assume that there was some reason to trigger a compilation, instead of skipping the whole step and reusing a previously compiled artifact.
Tags: Bazel Development Resources Tools