March 25, 2018

Bazel: A high-level build tool for your polyglot codebase

At my company, we are moving our medium-sized C++ and Python codebase from CMake to Bazel. Moving to a new build system is a lot of work, so naturally, it has to be justified. Here are the benefits most important to us.

First-class build system for C++

Our primary language today is C++, which doesn’t have a standard build system. Bazel’s C++ support is very good, and supports everything from code coverage reports to profile-guided optimization.

Having a supported, “blessed” way of doing these things is a breath of fresh air from the DIY approach of CMake. It feels a lot more like the “batteries included” approach of modern languages that have their own build systems, like Python and Rust.

Opinionated, declarative language

I’m a strong believer that when you are developing code, you should not have to think about your build system. In an ideal world there should be one, and exactly one, way to do every task you might want to accomplish.

Bazel comes very close to accomplishing this ideal. Here’s a sample BUILD file for a simple C++ project with one library and one binary:

cc_library(
    name = "hello-greet",
    srcs = ["hello-greet.cc"],
    hdrs = ["hello-greet.h"],
)

cc_binary(
    name = "hello-world",
    srcs = ["hello-world.cc"],
    deps = [
        ":hello-greet",
    ],
)

That’s pretty cut and dry, and very high-level. Armed with this example you could copy this and customize it to your project with minimal effort.

Note that some projects like to use “automagic” build systems with standard naming schemes that scan for source files, header files, and tests. This works great in some cases, but makes it hard to customize anything in your build. Having a BUILD file gives you a natural extension point to place, say, external libraries to link to, or to tweak how your compiler optimizes floating point operations.

These are real-world needs in just about any codebase that uses a compiler. By going through the minimal effort of writing BUILD rules, we preserve the “one way to do it” mentality, while minimizing the incremental effort needed to customize later.

A coherent story for cross-language development

Different languages are good for different things. We want to support trying different languages, and writing modules in one language that talk to modules in another language. But maintaining a heterogenous codebase can be really time consuming.

We use C++ for heavy lifting, but Python for the visualization and lighter-weight analysis tasks. We use Cython to integrate these two and generate native Python modules from our C++ code. But today, this feels pretty haphazard: There’s a whole dance of building from one directory, running Python from another, and searching around for what we think is the right Cython output. Bazel has a way of bringing the output of a cc_library into a py_binary target, and the whole thing is cohesive and deterministic.

I should note that while the Python story works very well for Google’s use cases (and I think it will work for ours), it doesn’t feel as complete as some of the other parts of bazel. Broadening Python to support a wider range of use cases is still an active area of development within bazel, and you should do some research to make sure it will work well for you.

Python-C++ integration is only one part of the story, though. We’ve been looking at Rust to help mend some of the pain points of using C++ in a production environment. Bazel already has what appears to be very good support for Rust, and cargo-raze adds support for generating BUILD files from a Cargo.toml. In fact, many languages have bazel rules that are maintained by Google. Knowing that we aren’t locked into a handful of languages is a huge selling point for Bazel.

Reproducible builds means reproducible research

We do a lot of research, and research needs to be reproducible. The code we run, and binaries we produce from that code, are one major aspect of this.

We should be able to document a research result with a commit hash, then go back and reproduce exactly the binary we used to produce that research. This removes a lot of questions and speeds up the research process dramatically, especially when we want to revisit something from more than a few weeks ago. Bazel’s reproducible builds solve this problem.

Extensibility

The Bazel team has worked very hard to make extensibility first-class. Support for most languages is separate and apart from the Bazel codebase itself, meaning anyone could have implemented them, and many people have. Many of the language rules under bazelbuild were originally contributed by the community. There are also third-party rules for doing things like generating protobufs code, or packaging binaries with the files they need to run.

That said, extending Bazel is not for the faint of heart. Bazel is a build tool meant to be used by codebases with tens of thousands of sub-projects, and with a very strong need for reproducible builds. This means that they had to develop their own language, Skylark, which looks and feels like Python but is limited in its access to the outside world.

Skylark has its own learning curve, and while it’s a small language, the end result is that there will probably be only a few people on your team who learn it enough to add new build extensions. On the other hand, you should only need a few people on your team who are doing this. Once you set up your own rules, everyone else on the team can use them without thinking about how they’re implemented. The tradeoff here is that if you need to extend Bazel, you have to pay a little more upfront in effort, in exchange for benefits in maintainability over time.

Reasons not to use bazel

If you’re working in only one language that already has a decent “default” build tool with dependency management, I’d strongly encourage you to look at how you can accomplish your goals with only that build tool. You’ll inevitably find things that aren’t supported for your language in Bazel, and the focus of effort is always going to be on your language’s default build tool.

However, if you develop in C++, or develop in a heterogeneous codebase with a handful of languages that you want to interoperate, look at Bazel. There’s a good chance it’s the build tool you’ve been waiting for!

© Tyler Mandry 2018

Powered by Hugo & Kiss.