Building Rust Code - Current Issues
December 11, 2013
As rustpkg is still in its infancy, most Rust code tends to be built with make, other tools, or by hand. I’ve been working on updating Servo’s build system to something a bit more reliable and fast, and so I’ve been giving a lot of thought to build tooling with regards to Rust.
In this post, I want to cover what the current issues are with building Rust code, especially with regards to external tooling. I’ll also describe some recent work I did to address these issues. In the future, I want to cover specific ways to integrate Rust with a few different build tools.
Building Rust with existing build tools is a little difficult at the moment. The main issues are related to Rust’s attempt to be a better systems language than the existing options.
For example, Rust uses a larger compilation unit than C and C++ compilers, and existing build tools are designed around single file compilation. Rust libraries are output with unpredictable names. And dependency information must be done manually.
Many programming languages compile one source file to one output file and then
collect the results into some final product. In C, you compile
.c files to
.o files, then archive or link them into
.dylib, and so on
depending on the platform and whether you are building an executable, static
library, or shared library. Even Java compiles
.java inputs to one or more
.class outputs, which are then normally packaged into a
In Rust, the unit of compilation is the crate, which is a collection of modules and items. A crate may consist of a single source file or an arbitrary number of them in some directory hierarchy, but its output is a single executable or library.
Using crates as the compilation unit makes sense from a compiler point of view, as it has more knowledge during compilation to work from. It also makes sense from a versioning point of view as all of the crate’s contents goes together. Using crates as the compilation unit allows for cyclic dependencies between modules in the same crates, which is useful to express some things. It also means that separate declaration and implementation pieces are not needed, such as the header files in C and C++.
Most build tools assume a model similar to that of a typical C compiler. For example, make has pattern rules that can take and input to and output based on on filename transformations. These work great if one input produces one output, but they don’t work well in other cases.
Rust still has a main input file, the one you pass to the compiler, so this difference doesn’t have a lot of ramifications when using existing build tools.
Compilers generally have an option for what to name their output files, or
else they derive the output name with some simple formula. C compilers use the
-o option to name the output; Java just names the files after the classes
they contain. Rust also has a
-o option, which works like you expect, except
in the case of libraries where it is ignored.
Libraries in Rust are special in order to avoid naming collisions. Since libraries often end up stored centrally, only one library can have a given name. If I create a library called libgeom it will conflict with someone else’s libgeom. Operating systems and distributions end up resolving these conflicts by changing the names slightly, but it’s a huge annoyance. To avoid collisions, Rust includes a unique identifier called the crate hash in the name. Now my Rust library libgeom-f32ab99 doesn’t conflict with libgeom-00a9edc.
Unfortunately, the current Rust compiler computes the crate hash by hashing the link metadata, such as name and version, along with the link metadata of its dependencies. This results in a crate hash that only the Rust compiler is realistically able to compute, making it seem pseudo-random. This causes a huge problem for build tooling as the output filename for libraries in unknown.
To work around this problem when using make, the Rust and Servo build systems
use a dummy target called
libfoo.dummy for a library called foo, and after
rustc to build the library, it creates the
libfoo.dummy file so
that make has some well known output to reason about. This workaround is a bit
messy and pollutes the build files.
of what a
Makefile looks like with this
RUSTC ?= rustc SOURCES = $(find . -name '*.rs') all: librust-geom.dummy librust-geom.dummy: lib.rs $(SOURCES) @$(RUSTC) --lib $< @touch $@ clean: @rm -f *.dummy *.so *.dylib *.dll
While this works, it also has some drawbacks. For example, if you edit a file
during a long compile, the
libfoo.dummy will get updated after the compile
is finished, and rerunning the build won’t detect any changes. The timestamp
of the input file will be older than the final output file that the build tool
is checking. If the build system knew the real output file name, it could
compare the correct timestamps, but that information has been locked inside
the Rust compiler.
Build systems need to be reliable. When you edit a file, it should trigger the correct things to get rebuilt. If nothing changes, nothing should get rebuilt. It’s extremely frustrating if you edit a file, rebuild the library, and find that your code changes aren’t reflected in the new output for some reason or that the library is not rebuilt at all. Reliable builds need accurate dependency information in order to accomplish this.
There’s currently no way for external build tools to get dependency information about Rust crates. This means that developers tend to list dependencies by hand which is pretty fragile.
One quick way to approximate dependency info is just to recursively find every
*.rs in the crate’s source directory. This can be wrong for multiple reasons;
include_str! macros are used to pull in files that
*.rs or conditional compilation may omit several files.
This is similar to dealing with header dependencies by hand when working with C and C++ code. C compilers have options to generate dependency info to deal with this, which used by tools like CMake.
The price of inaccurate or missing dependency info is an unreliable build and
a frustrated developer. If you find yourself reaching for
make clean, you’re
probably suffering from this.
Making It Better
It’s possible to solve these problems without sacrificing the things we want and falling back to doing exactly what C compilers do. By making the output file knowable and handling dependencies automatically we make make build tool integration easy and the resulting builds reliable. This is exactly what I’ve been working on the last few weeks.
Stable and Computable Hashes
The first thing we need is to make the crate hash stable and easily computable
by external tools. Internally, the Rust compiler uses
SipHash to compute the crate hash, and takes
into account arbitrary link metadata as well as the link metadata of its
dependencies. SipHash is not something easily computed from a
the link metadata is not so easy to slurp and normalize from some dependency
I’ve just landed a pull request
that replaces the link metadata with a package identifier, which is a crate
level attribute called
pkgid. You declare it like
#[pkgid="github.com/mozilla-servo/rust-geom#0.1"]; at the top of your
lib.rs. The first part,
github.com/mozilla-servo, is a path, which serves
as both a namespace for your crate and a location hint as to where it can be
obtained (for use by rustpkg for example). Then comes the crate’s name,
rust-geom. Following that is the version identifier
0.1. If no
attribute is provided, one is inferred with an empty path, a 0.0 version, and
a name based on the name of the input file.
To generate a crate hash, we take the SHA256 digest of the
attribute. SHA256 is readily available in most languages or on the command
line, and the
pkgid attribute is very easy to find by running a regular
expression over the main input file. The first eight digits of this hash are
used for the filename, but the full hash is stored in the crate metadata and
used as part of the symbol hashes.
Since the crate hash no longer depends on the crate’s dependencies, it is
stable so long as the
pkgid attribute doesn’t change. This should happen
very infrequently, for instance when the library changes versions.
This makes the crate hash computable by pretty much any build tool you can find, and means rustc generates predictable output filenames for libraries.
I’ve also got a pull request,
which should land soon, to enable rustc to output make-compatible dependency
information similar to the
-MMD flag of gcc. To use it, you give rustc the
--dep-info option and for an input file of
lib.rs it will create a
which can be used by make or other tools to learn the true dependencies.
lib.d file will look something like this:
librust-geom-da91df73-0.0.dylib: lib.rs matrix.rs matrix2d.rs point.rs rect.rs side_offsets.rs size.rs
Note that this list of dependencies will include code pulled in via the
include_str! macros as well.
of a handwritten
Makefile using dependency info. Note that this uses a
hard-coded output file name, which works because crate hash is stable unless
pkgid attribute is changed:
RUSTC ?= rustc all: librust-geom-851fed20-0.1.dylib librust-geom-851fed20-0.1.dylib: lib.rs @$(RUSTC) --dep-info --lib $< -include lib.d
Now it will notice when you change any of the
.rs files without needed to
explicitly list them, and this will get updated as your code changes
automatically. A little
Makefile abstraction on top of this can make it
quite nice and portable.
In the next few posts, I’ll show examples of integrating the improved Rust compiler with some existing build systems like make, CMake, and tup.
(Update: the next post covers building Rust with Make.)