Build files are the best tool to represent software architecture

bloppe · 2025-10-08T14:21:46 1759933306

At Google, you can just run "buildifier" to auto-generate the build files from the source code import statements, so clearly the build files are redundant. That's not the point. Bazel is supposed to be fast, and keeping the analysis graph loaded in memory is part of that. I always thought of the build files as a sort of cache. It would usually take a few seconds for buildifier to "revalidate" that cache, which can be done infrequently, then the actual build itself, which happens much more frequently while iterating, is faster. You really start to notice the performance difference in larger repos.

matttproud · 2025-10-07T14:09:20 1759846160

One of the worst things a developer accustomed to Bazel (and its relatives) can do with a modern language (say Go or Rust) is to model code and organize it through the Bazel concept of a build target (https://bazel.build/concepts/build-ref) first and then second represent it with the language's local organization concepts versus the other way around. One should preferentially model the code with the language-local organizing concept in an idiomatic way (e.g., a Go package — https://go.dev/ref/spec#Package_clause) and THEN map that instance of organization to a build target (e.g., go_library).

When you do this in the wrong order, you end up with very poorly laid out concepts from a code organization standpoint, which is why vagaries like this needed to be written:

* https://google.github.io/styleguide/go/best-practices.html#p...

* https://matttproud.com/blog/posts/go-package-centricity.html

In languages that operate on a flat namespace of compilable units (e.g., C++ or Java), build target sizing and grouping in Bazel (and its relatives) largely doesn't matter (from a naming the namespace and namespace findability+ergonomics perspective). But the moment Bazel starts interfacing with a language that has strict organization and namespacing concepts, this can get rather hairy. The flat namespace practice with Bazel has (IMO) led to code organization brain-rot:

> Oh, I created another small feature; here, let me place it in another (microscopic) build target (without thinking about how my users will access the symbols, locate the namespace, or have an easy way of finding it).

— — —

Note: The above is not a critique of Bazel and such. More of a meta-comment on common mispractices I have seen in the wild. The build system can be very powerful for certain types of things (e.g., FFI dependency preparation and using Aspects as a form of meta-building and -programming).

jmmv · 2025-10-07T17:28:00 1759858080

> One of the worst things a developer accustomed to Bazel (and its relatives) can do with a modern language (say Go or Rust) is to model code and organize it through the Bazel concept of a build target (https://bazel.build/concepts/build-ref) first

And that's exactly what I was arguing against in the article! I've seen this happen a few times already (in Java and TypeScript specifically) where Bazel's fine-grained target definitions are pushed as "best practice" and everybody ends up hating the results, for good reasons.

There are _different_ ways in which one can organize the Bazel build rules that go against those best practices (like the 1:1:1 rule for Java), and I think you can end up with something that better maps to first principles / or what native built tooling does.

whytevuhuni · 2025-10-07T18:07:30 1759860450

> where Bazel's fine-grained target definitions are pushed as "best practice" and everybody ends up hating the results, for good reasons.

What are some of those good reasons (assuming they differ from GP's)?

I don't have much experience with Bazel aside from setting up a simple local workspace and following the tutorial.

jmmv · 2025-10-07T18:20:16 1759861216

You tend to end up with way too many targets that don't actually "mean anything" to a human. In one codebase I have to deal with, the Bazel build has ~10k targets whereas the previous non-Bazel build had ~400. Too many targets have an impact in various dimensions. Some examples:

* The build files are unreadable. If targets don't mean anything to a human, updates to build files become pure toil (and is when devs ask for build files to be auto-generated from source).

* IDE integrations (particularly via the IntelliJ Bazel plugin) become slower because generating metadata for those targets takes time.

* Binary debugging is slower because the C/C++ rules generate one intermediate .so file per target and GDB/LLDB take a long time to load those dependencies vs. a smaller set of deps.

* Certain Java operations can be slower. In the case of Java, the rules generate one intermediate JAR file per target, which has a direct impact on CLASSPATH length and that may matter when you do introspection. This tends to matter for tests (not so much for prod where you use a deploy JAR which collapses all intermediate JARs into just one).

whytevuhuni · 2025-10-07T19:03:41 1759863821

Ah, that makes a lot of sense. Thanks!

My intuition was wrong, my naive understanding was that:

* Non-human intermediate targets would either be namespaced and available only in that namespace, or could be marked as hidden, and not clutter auto-completion

* IDE integrations would benefit, since they only have to deal with Bazel and not Bazel + cargo/go/Makefile/CMake/etc

* I thought C/C++ rules would generate .o files, and only the final cc_shared_library would produce an .so file

* Similar for .jar files

I guess my ideal build system has yet to be built. :(

jmmv · 2025-10-07T19:10:56 1759864256

> * Non-human intermediate targets would either be namespaced and available only in that namespace, or could be marked as hidden, and not clutter auto-completion

This is actually possible but you need the new JetBrains-owned Bazel plugin _and_ you need to leverage visibility rules. The latter are something that's unique to Bazel (none of the other language-specific package managers I've touched upon in these replies offers it) and are even harder to explain to people somehow... because these only start making sense when you pass a certain codebase size / complexity.

> * I thought C/C++ rules would generate .o files, and only the final cc_shared_library would produce an .so file > * Similar for .jar files

These are possible too! Modern Bazel has finally pushed out all language-specific logic out of the core and into Starlark rules (and Buck2 has been doing this from the ground up). There is nothing preventing you from crafting your own build rules that behave in these specific ways.

In any case... as for dynamic libraries per target, I do not think what I described earlier is the default behavior in Bazel (we explicitly enable dynamic libraries to make remote caching more efficient), so maybe you can get what you want already by being careful with cc_shared_library and/or being careful about tagging individual cc_libraries as static/dynamic.

For Java, I've been tempted to write custom rules that do _not_ generate intermediate JARs at all. It's quite a bit of work though, so I haven't, but it could be done. BTW I'll actually be describing this problem in a BazelCon 2025 lighting talk :)

actionfromafar · 2025-10-07T14:45:52 1759848352

Not having used Bazel in anger (or barely at all) I think I might understand what you mean, but this topic cries out for a blog post.

kiitos · 2025-10-07T15:29:04 1759850944

indeed, no idea why so many folks seem to think `bazel` is like some co-equal alternative to language-native build tooling/processes, it's a fine tool for certain (niche) use cases but in no way is it ubiquitous or anything approaching a common standard

xyzzy_plugh · 2025-10-07T12:28:05 1759840085

This is only true when the dependency structure is not already apparent. Almost all modern languages solve for this in their import statements and/or via their own package manager, at which point pushing everything up into Bazel is indeed redundant.

If anything this highlights the failure of languages solving for this themselves. I'm looking at you, C++.

It's no surprise Bazel is a hard sell for Rust, Go, Node, etc. because for those languages/ecosystems Bazel BUILD files are not the best tool to represent software architecture.

jmmv · 2025-10-07T13:23:10 1759843390

The problem is that anything that's _apparent_ and not _enforced_ will be messed up over time. Maybe not in a project with few people where everyone is an expert on how "things are supposed to be", but it will inevitably happen when you add more and more people.

And the whole point of the article is to say that import statements do actually _not_ solve this issue, because import statements are at the file level, not at the module level (whatever module means in your mind).

In any case. As I mentioned in the article en passing, other languages _do_ provide similar features to Bazel's build files though, and I explicitly called out Rust as one of them. When you are defining crates and expressing dependencies via Cargo, you are _essentially doing the same_ as what I was describing in the article. Same with Go if you are breaking your code apart into multiple modules

But then we all know that there are some huge repos out there that are just "one module" and you can't make anything out of their internal structure. Hence you start breaking them apart into Crates, Go modules, NPM packages, you name it or... you know, add Bazel and build files. They are the same tool -- and that's why I didn't write Bazel in the title, because I imagined "build files" more generically. I guess I needed to be clearer there.

xyzzy_plugh · 2025-10-07T14:23:54 1759847034

> The problem is that anything that's _apparent_ and not _enforced_ will be messed up over time

We already have the tools to enforce these things in many mainstream languages.

Breaking things apart into crates/modules certainly makes sense sometimes, but other times it does not? If you have a monorepo, do you really need multiple modules? And if you don't, does that mean your architecture is difficult to understand? I don't think that tracks at all, so I don't really agree with where you're headed.

> But then we all know that there are some huge repos out there that are just "one module" and you can't make anything out of their internal structure.

There's always some shitty code out there, sure. But I don't like the suggestion that "one module" can't be coherent. It's orthagonal to the architecture. Not everything needs to be made generic and reusable.

> And the whole point of the article is to say that import statements do actually _not_ solve this issue, because import statements are at the file level, not at the module level (whatever module means in your mind).

This is not true for Go, for example. Import statements absolutely do solve this problem in Go. I rarely need to ever look at module files which are in some ways a byproduct of the import statements.

jmmv · 2025-10-07T14:35:12 1759847712

> This is not true for Go, for example. Import statements absolutely do solve this problem in Go. I rarely need to ever look at module files which are in some ways a byproduct of the import statements.

Go imports still work at the Go package level. If you have multiple .go source files in one package, you have the exact same issue I described for Java.

    .../pkg1/foo.go -> import .../pkg2
    .../pkg1/bar.go -> import .../pkg3

If I'm editing / reviewing a change to pkg1/foo.go, I cannot tell that pkg1 _already_ depends on pkg3. Can I?

ghthor · 2025-10-07T20:34:22 1759869262

go list can tell you that pkg1 imports pkg2.

At work, go list was too slow and depended on a git checkout so we wrote our own import graph parser using the go std lib parser and operate on byte slices of the files we read directly from git. It’s speed of light fast and we can compute go import graphs in parallel from multiple commits to determine what has changed in the graph so we can reduce the scope of what is tested.

NomDePlum · 2025-10-07T14:07:35 1759846055

Adding more and more people is often the thing to avoid.

I'm not going to say it can be avoided in all cases but modularity, team structure and architecture both system and organisational can avoid this in a lot of cases.

jakevoytko · 2025-10-07T14:31:35 1759847495

On top of that, the software world has changed dramatically since Bazel was first released. In practice, a git hash and a compile command for a command runner are more than enough for almost everyone.

What has changed in the past ~15 years? Many libraries and plugins have their own compilers nowadays. This increases the difficulty of successfully integrating with Bazel. Even projects that feel like they should be able to properly integrate Bazel (like Kubernetes) have removed it from the project as a nuisance.

Back when it was first designed, even compiling code within the same language could be a struggle; I remember going through many iterations of DLL hell back when I was a C++ programmer. This was the "it works on my machine" era. Bazel was nice because you could just say "Download this version of this thing, and give me a BUILD file path where I can reference it." Sometimes you needed to write some Starlark, but mostly not.

But now, many projects have grown in scale and complexity and they want to have their own automated passes. Just as C++ libraries needed special library wrappers for autotools within Bazel, now you often need to write multiple library compiler/automation wrappers yourself in any context. And then you'll find that Bazel's assumptions don't match the underlying code's. For example, my work's Go codebase compiles just fine with a standard Go compiler, but gazelle pukes because (IIRC) one of our third-party codegen tools outputs files with multiple packages to the same directory. When Etsy moved its Java codebase to Bazel, they needed to do some heavy refactoring because Bazel identified dependency loops and refused to compile the project, even though it worked just fine with javac. You can always push up your monocle and derisively say "you shouldn't have multiple packages per directory! you shouldn't have dependency loops!", but you should also have a compiler that can run your code just like the underlying language without needing to influence it at all.

That's why most engineers just need command runners. All of these languages and libraries are already designed to successfully run in their own contexts. You just need something to kick off the build with repeatable arguments across machines.

kyrra · 2025-10-07T12:53:34 1759841614

But there is a lot of C, C++, and Java in the world.

It also helps in a mono-repo to help control access to packages. BAZEL makes it so you can't import packages that aren't visible to your package.

eej71 · 2025-10-07T12:47:13 1759841233

Bazel is a hard sell overall.

bschwindHN · 2025-10-07T11:51:59 1759837919

I must live in some alternate reality where this just isn't a problem, or maybe the author has not described the problem well. But reading the article, it just feels like some software form of bureaucracy.

jmmv · 2025-10-07T13:28:52 1759843732

If you work on a well-modularized codebase, with small and individual Go modules, or Rust crates or NPM packages or whatever you call them... you do not live in an alternate reality. You are doing essentially the same as I was describing because the package managers for those tools force you to express cross-dependencies explicitly in a separate "metadata file". You are just doing so via a different, language-specific tool.

The problem appears when: 1. you have a significant number of people working on the project and 2. one of these "modules" has become too big internally where it's hard to make sense of its internal structure. Having a monorepo makes the problem even more likely. At that point, you'll probably want to start breaking up that gnarly module into pieces so that you can see its structure again, right?

esafak · 2025-10-07T14:41:50 1759848110

Then you might have titled the article "BUILD files help reveal architecture on poorly organized code bases".

Mesopropithecus · 2025-10-07T13:09:05 1759842545

Was gonna write that it pays dividends only from a certain project size onwards, but in fact it could be true from a certain org size instead. So yes, it has some component of bureaucracy-as-code, and I agree with the author that that can be a good thing.

cadamsdotcom · 2025-10-07T23:13:02 1759878782

A great thing to put in your codebase is tooling-informed - indeed, tooling-enforced - protection of architectural constraints.

Preventing your backend’s web-route handler functions from directly instantiating a database client, forcing the code to instead access the db via a logic or service layer, preserves separation of concerns as the codebase grows. It’s obvious to human software engineers and usually is institutional knowledge, and everyone hates having to tell someone in a code review that they broke a rule that “everyone” knows..

Instead, use tooling to enforce these separations. That lets both new employees and agents autonomously work without making a mess and without other humans “in the loop” informing them not to break rules that aren’t written down anywhere - because when you make tools, now the rules are written down.

LLMs can quickly create scripts in your language of choice, that walk the AST of your code before you commit. They can check for violations of rules as arbitrary as you like. Put that little script in your codebase, run it every CI run, and it’ll keep paying dividends.

It’s like linting for architecture.

kiitos · 2025-10-07T15:39:17 1759851557

> BUILD files give you a chance to encode the high-level architecture of your software project as a graph of dependencies that lives outside of the code.

this is wild, the code literally defines dependency relationships at the mechanical level, it's the source of truth

of course you can't derive dep graphs from source code via simple file-based grep analysis, but that's obvious? you need to use language-specific semantically-aware tools to do that. if the author believes otherwise that's their mistake

CuriouslyC · 2025-10-07T12:15:07 1759839307

I like to take artifacts people are guaranteed to need and use them to represent architecture via introspection. Package files and manifests are enough to get you 80% of the way there, you can add annotations to package files and docstrings/doc comments to go as deep as you need. That keeps your architecture in Git and coupled to the code.

simpaticoder · 2025-10-07T12:32:57 1759840377

Many words and few to the point. tldr: This person likes build-tools (Google's Bazel) that can constrain dependencies to be used only by certain packages, and says this documents the project's architecture.

I've always thought of "architecture" as a high-level description of run-time behavior, not a set of compile-time dependency constraints.

npodbielski · 2025-10-07T12:56:52 1759841812

Exactly. I thought that in modern languages and frameworks there are better tools to do that like 'ProjectReference' in .Net. Oh well..

jmmv · 2025-10-07T13:44:02 1759844642

I have worked with ProjectReference before. How is it different from expressing a cross-module dependency in a Bazel BUILD file?

But as I already said in two other comments in this discussion, ProjectReference would be equivalent to what I'm describing in the article, just using language-specific tooling. If you are breaking your solution into various projects and keeping them separate with cross-references among them, you are doing exactly what I was describing already.

flanked-evergl · 2025-10-07T11:11:37 1759835497

Doubt it. The problem detailed at the start of the post can be solved by static analysis/linting. And no amount of new strange files will give you the same benefit as using some stock standard static analysis tools.

jmmv · 2025-10-07T13:29:57 1759843797

Please elaborate on what those "some stock standard static analysis tools" are.

flanked-evergl · 2025-10-07T14:26:50 1759847210

For python, use ruff.

https://docs.astral.sh/ruff/rules/unused-import/

jmmv · 2025-10-07T15:02:54 1759849374

I don't think the article talks about unused dependencies anywhere, does it?

flanked-evergl · 2025-10-07T16:56:33 1759856193

Very first complaint is that one can't tell from imports what is being used or not. If you use a linter that removes unused imports then you can tell.

jmmv · 2025-10-07T17:28:48 1759858128

No. The very first complaint is that if you don't know what is being used or not, you cannot tell whether a new "import" will add _another_ cross-module dependency that did not yet exist.

As I already mentioned in another comment, you have:

    .../module1/__init__.py -> import module3
    .../module1/foobar.py -> import module3
    .../module1/baz.py -> import module2

Now you are editing or reviewing a change to foobar.py. How can you tell if depending on module2 is conceptually OK? You need to look at baz.py to prove to yourself that the dependency already exists. Or you need to know a priory that it's an OK thing to do.

joaonmatos · 2025-10-07T19:46:17 1759866377

Another fine day to be using Brazil

commandersaki · 2025-10-08T02:58:14 1759892294

Ah not so good memories.

npodbielski · 2025-10-07T12:55:09 1759841709

So this incomprehensible file at the end of article, that supposed to be "lean" is what the author is fighting for?

And it supposed to show 'architecture'?

Wow. I am happy that I never started working with Java. That is terrible.

simpaticoder · 2025-10-07T13:08:15 1759842495

This article has nothing to do with Java, and the author explicitly states that.

npodbielski · 2025-10-07T13:36:43 1759844203

But somehow all the examples involve it.

jmmv · 2025-10-07T17:37:27 1759858647

In the article, yes. Look at the other comment threads in this discussion though: they touch upon many other languages.