Building Rust Code - Current Issues
December 11, 2013
As rustpkg is still in its infancy, most Rust code tends to be built with make, other tools, or by hand. I’ve been working on updating Servo’s build system to something a bit more reliable and fast, and so I’ve been giving a lot of thought to build tooling with regards to Rust.
In this post, I want to cover what the current issues are with building Rust code, especially with regards to external tooling. I’ll also describe some recent work I did to address these issues. In the future, I want to cover specific ways to integrate Rust with a few different build tools.
Current Issues
Building Rust with existing build tools is a little difficult at the moment. The main issues are related to Rust’s attempt to be a better systems language than the existing options.
For example, Rust uses a larger compilation unit than C and C++ compilers, and existing build tools are designed around single file compilation. Rust libraries are output with unpredictable names. And dependency information must be done manually.
Compilation Unit
Many programming languages compile one source file to one output file and then
collect the results into some final product. In C, you compile .c
files to
.o
files, then archive or link them into .lib
, .a
, .dylib
, and so on
depending on the platform and whether you are building an executable, static
library, or shared library. Even Java compiles .java
inputs to one or more
.class
outputs, which are then normally packaged into a .jar
.
In Rust, the unit of compilation is the crate, which is a collection of modules and items. A crate may consist of a single source file or an arbitrary number of them in some directory hierarchy, but its output is a single executable or library.
Using crates as the compilation unit makes sense from a compiler point of view, as it has more knowledge during compilation to work from. It also makes sense from a versioning point of view as all of the crate’s contents goes together. Using crates as the compilation unit allows for cyclic dependencies between modules in the same crates, which is useful to express some things. It also means that separate declaration and implementation pieces are not needed, such as the header files in C and C++.
Most build tools assume a model similar to that of a typical C compiler. For example, make has pattern rules that can take and input to and output based on on filename transformations. These work great if one input produces one output, but they don’t work well in other cases.
Rust still has a main input file, the one you pass to the compiler, so this difference doesn’t have a lot of ramifications when using existing build tools.
Output Names
Compilers generally have an option for what to name their output files, or
else they derive the output name with some simple formula. C compilers use the
-o
option to name the output; Java just names the files after the classes
they contain. Rust also has a -o
option, which works like you expect, except
in the case of libraries where it is ignored.
Libraries in Rust are special in order to avoid naming collisions. Since libraries often end up stored centrally, only one library can have a given name. If I create a library called libgeom it will conflict with someone else’s libgeom. Operating systems and distributions end up resolving these conflicts by changing the names slightly, but it’s a huge annoyance. To avoid collisions, Rust includes a unique identifier called the crate hash in the name. Now my Rust library libgeom-f32ab99 doesn’t conflict with libgeom-00a9edc.
Unfortunately, the current Rust compiler computes the crate hash by hashing the link metadata, such as name and version, along with the link metadata of its dependencies. This results in a crate hash that only the Rust compiler is realistically able to compute, making it seem pseudo-random. This causes a huge problem for build tooling as the output filename for libraries in unknown.
To work around this problem when using make, the Rust and Servo build systems
use a dummy target called libfoo.dummy
for a library called foo, and after
running rustc
to build the library, it creates the libfoo.dummy
file so
that make has some well known output to reason about. This workaround is a bit
messy and pollutes the build files.
Here’s an
example
of what a Makefile
looks like with this .dummy
workaround:
RUSTC ?= rustc
SOURCES = $(find . -name '*.rs')
all: librust-geom.dummy
librust-geom.dummy: lib.rs $(SOURCES)
@$(RUSTC) --lib $<
@touch $@
clean:
@rm -f *.dummy *.so *.dylib *.dll
While this works, it also has some drawbacks. For example, if you edit a file
during a long compile, the libfoo.dummy
will get updated after the compile
is finished, and rerunning the build won’t detect any changes. The timestamp
of the input file will be older than the final output file that the build tool
is checking. If the build system knew the real output file name, it could
compare the correct timestamps, but that information has been locked inside
the Rust compiler.
Dependency Information
Build systems need to be reliable. When you edit a file, it should trigger the correct things to get rebuilt. If nothing changes, nothing should get rebuilt. It’s extremely frustrating if you edit a file, rebuild the library, and find that your code changes aren’t reflected in the new output for some reason or that the library is not rebuilt at all. Reliable builds need accurate dependency information in order to accomplish this.
There’s currently no way for external build tools to get dependency information about Rust crates. This means that developers tend to list dependencies by hand which is pretty fragile.
One quick way to approximate dependency info is just to recursively find every
*.rs
in the crate’s source directory. This can be wrong for multiple reasons;
perhaps the include!
or include_str!
macros are used to pull in files that
aren’t named *.rs
or conditional compilation may omit several files.
This is similar to dealing with header dependencies by hand when working with C and C++ code. C compilers have options to generate dependency info to deal with this, which used by tools like CMake.
The price of inaccurate or missing dependency info is an unreliable build and
a frustrated developer. If you find yourself reaching for make clean
, you’re
probably suffering from this.
Making It Better
It’s possible to solve these problems without sacrificing the things we want and falling back to doing exactly what C compilers do. By making the output file knowable and handling dependencies automatically we make make build tool integration easy and the resulting builds reliable. This is exactly what I’ve been working on the last few weeks.
Stable and Computable Hashes
The first thing we need is to make the crate hash stable and easily computable
by external tools. Internally, the Rust compiler uses
SipHash to compute the crate hash, and takes
into account arbitrary link metadata as well as the link metadata of its
dependencies. SipHash is not something easily computed from a Makefile
and
the link metadata is not so easy to slurp and normalize from some dependency
graph.
I’ve just landed a pull request
that replaces the link metadata with a package identifier, which is a crate
level attribute called pkgid
. You declare it like
#[pkgid="github.com/mozilla-servo/rust-geom#0.1"];
at the top of your
lib.rs
. The first part, github.com/mozilla-servo
, is a path, which serves
as both a namespace for your crate and a location hint as to where it can be
obtained (for use by rustpkg for example). Then comes the crate’s name,
rust-geom
. Following that is the version identifier 0.1
. If no pkgid
attribute is provided, one is inferred with an empty path, a 0.0 version, and
a name based on the name of the input file.
To generate a crate hash, we take the SHA256 digest of the pkgid
attribute. SHA256 is readily available in most languages or on the command
line, and the pkgid
attribute is very easy to find by running a regular
expression over the main input file. The first eight digits of this hash are
used for the filename, but the full hash is stored in the crate metadata and
used as part of the symbol hashes.
Since the crate hash no longer depends on the crate’s dependencies, it is
stable so long as the pkgid
attribute doesn’t change. This should happen
very infrequently, for instance when the library changes versions.
This makes the crate hash computable by pretty much any build tool you can find, and means rustc generates predictable output filenames for libraries.
Dependency Management
I’ve also got a pull request,
which should land soon, to enable rustc to output make-compatible dependency
information similar to the -MMD
flag of gcc. To use it, you give rustc the
--dep-info
option and for an input file of lib.rs
it will create a lib.d
which can be used by make or other tools to learn the true dependencies.
The lib.d
file will look something like this:
librust-geom-da91df73-0.0.dylib: lib.rs matrix.rs matrix2d.rs point.rs rect.rs side_offsets.rs size.rs
Note that this list of dependencies will include code pulled in via the
include!
and include_str!
macros as well.
Here’s an
example
of a handwritten Makefile
using dependency info. Note that this uses a
hard-coded output file name, which works because crate hash is stable unless
the pkgid
attribute is changed:
RUSTC ?= rustc
all: librust-geom-851fed20-0.1.dylib
librust-geom-851fed20-0.1.dylib: lib.rs
@$(RUSTC) --dep-info --lib $<
-include lib.d
Now it will notice when you change any of the .rs
files without needed to
explicitly list them, and this will get updated as your code changes
automatically. A little Makefile
abstraction on top of this can make it
quite nice and portable.
Next Up
In the next few posts, I’ll show examples of integrating the improved Rust compiler with some existing build systems like make, CMake, and tup.
(Update: the next post covers building Rust with Make.)