The Changelog 228: Servo and Rust with Jack Moffitt – Listen on Changelog.com
]]>After landing the pkgid
attribute work there was much community discussion
about how that feature could be improved. The net result was:
pkgid
was renamed to crate_id
it’s being used to identify a crate and
not a package, which is a grouping of crates. Actually, a package is still a
pretty fluid concept in Rust right now.crate_id
attribute can now override the inferred name of the crate
with new syntax. A crate_id
of github.com/foo/rust-bar#bar:1.0
names the
crate bar
which can be found at github.com/foo/rust-bar
. Previously the
crate name was inferred to be the last component of the path, rust-bar
.--crate-id
, --crate-name
, and --crate-file-name
so get the
value of the crate_id
attribute, the crate’s name, and output filenames
the compiler will produce.These changes made a good thing even better.
The
Makefile
hasn’t changed much, but here is a much simpler
rust.mk
that the new compiler flags enable:
define RUST_CRATE
_rust_crate_dir = $(dir $(1))
_rust_crate_lib = $$(_rust_crate_dir)lib.rs
_rust_crate_test = $$(_rust_crate_dir)test.rs
_rust_crate_name = $$(shell $(RUSTC) --crate-name $$(_rust_crate_lib))
_rust_crate_dylib = $$(shell $(RUSTC) --crate-file-name --lib $$(_rust_crate_lib))
.PHONY : $$(_rust_crate_name)
$$(_rust_crate_name) : $$(_rust_crate_dylib)
$$(_rust_crate_dylib) : $$(_rust_crate_lib)
$$(RUSTC) $$(RUSTFLAGS) --dep-info --lib $$<
-include $$(patsubst %.rs,%.d,$$(_rust_crate_lib))
ifneq ($$(wildcard $$(_rust_crate_test)),"")
.PHONY : check-$$(_rust_crate_name)
check-$$(_rust_crate_name): $$(_rust_crate_name)-test
./$$(_rust_crate_name)-test
$$(_rust_crate_name)-test : $$(_rust_crate_test)
$$(RUSTC) $$(RUSTFLAGS) --dep-info --test $$< -o $$@
-include $$(patsubst %.rs,%.d,$$(_rust_crate_test))
endif
endef
No more nasty sed scripts necessary, but of course, the crate hash is still computable if you want to do it yourself for some reason.
]]>For this post, I’m going to use the rust-geom library as an example. It is a simple Rust library used by Servo to handle common geometric tasks like dealing with points, rectangles, and matrices. It is pure Rust code, has no dependencies, and includes some unit tests.
We want to build a dynamic library and the test suite, and the Makefile
should be able to run the test suite by using make check
. As much as
possible, we’ll use the same crate structure that
rustpkg uses so
that once rustpkg is ready for real use, the transition to it will be
painless.
Did you know that Makefile
s can define functions? It’s a little clumsy, but
it works and you can abstract a bunch of the tedium away. I’d never really
noticed them before dealing with the Rust and Servo build systems, which use
them heavily.
By using shell commands like shasum
and sed
, we can compute crate hashes,
and by using Make’s eval
function, we can dynamically define new
targets. I’ve created a rust.mk
which can be included in a Makefile
that
makes it really easy to build Rust crates.
Let’s look at a
Makefile
for rust-geom
which uses rust.mk
.
include rust.mk
RUSTC ?= rustc
RUSTFLAGS ?=
.PHONY : all
all: rust-geom
.PHONY : check
check: check-rust-geom
$(eval $(call RUST_CRATE, .))
It includes rust.mk
, sets up some basic variables that control the compiler
and flags, and then defines the top level targets. The magic bit comes the
call to RUST_CRATE
which takes a path to where a crate’s lib.rs
and
test.rs
are located. In this case the path is the current directory, .
.
RUST_CRATE
finds the pkgid
attribute in the crate and uses this to compute
the crate’s name, hash, and the output filename for the library. It then
creates a target with the same name as the crate name, in this case
rust-geom
, and a target for the output file for the library. It uses the
Rust compiler’s support for dependency information so that it will know
exactly when it needs to recompile things.
If the crate contains a test.rs
file, it will also create a target that
compiles the tests for the crates into an executable as well as a target to
run the tests. The executable will be named after the crate; for rust-geom it
will be named rust-geom-test
. The check target is also named after the
crate, check-rust-geom
.
The files lib.rs
and test.rs
are the files rustpkg itself uses by
default. This Makefile
does not support the pkg.rs
custom build logic, but
if you need custom logic, it is easy enough to modify this example. One
benefit of following in rustpkg’s footsteps here is that this same crate
should be buildable with rustpkg without modification.
rust.mk
is a a little ugly, but not too bad. It defines a few helper functions like
RUST_CRATE_PKGID
and RUST_CRATE_HASH
which are used by the main
RUST_CRATE
function. The syntax is a bit silly because of the use of eval
and the need to escape $
s, but it shouldn’t be too hard to follow if you’re
already familiar with Make syntax.
RUST_CRATE_PKGID = $(shell sed -ne 's/^\#\[ *pkgid *= *"\(.*\)" *];$$/\1/p' $(firstword $(1)))
RUST_CRATE_PATH = $(shell printf $(1) | sed -ne 's/^\([^\#]*\)\/.*$$/\1/p')
RUST_CRATE_NAME = $(shell printf $(1) | sed -ne 's/^\([^\#]*\/\)\{0,1\}\([^\#]*\).*$$/\2/p')
RUST_CRATE_VERSION = $(shell printf $(1) | sed -ne 's/^[^\#]*\#\(.*\)$$/\1/p')
RUST_CRATE_HASH = $(shell printf $(strip $(1)) | shasum -a 256 | sed -ne 's/^\(.\{8\}\).*$$/\1/p')
ifeq ($(shell uname),Darwin)
RUST_DYLIB_EXT=dylib
else
RUST_DYLIB_EXT=so
endif
define RUST_CRATE
_rust_crate_dir = $(dir $(1))
_rust_crate_lib = $$(_rust_crate_dir)lib.rs
_rust_crate_test = $$(_rust_crate_dir)test.rs
_rust_crate_pkgid = $$(call RUST_CRATE_PKGID, $$(_rust_crate_lib))
_rust_crate_name = $$(call RUST_CRATE_NAME, $$(_rust_crate_pkgid))
_rust_crate_version = $$(call RUST_CRATE_VERSION, $$(_rust_crate_pkgid))
_rust_crate_hash = $$(call RUST_CRATE_HASH, $$(_rust_crate_pkgid))
_rust_crate_dylib = lib$$(_rust_crate_name)-$$(_rust_crate_hash)-$$(_rust_crate_version).$(RUST_DYLIB_EXT)
.PHONY : $$(_rust_crate_name)
$$(_rust_crate_name) : $$(_rust_crate_dylib)
$$(_rust_crate_dylib) : $$(_rust_crate_lib)
$$(RUSTC) $$(RUSTFLAGS) --dep-info --lib $$<
-include $$(patsubst %.rs,%.d,$$(_rust_crate_lib))
ifneq ($$(wildcard $$(_rust_crate_test)),"")
.PHONY : check-$$(_rust_crate_name)
check-$$(_rust_crate_name): $$(_rust_crate_name)-test
./$$(_rust_crate_name)-test
$$(_rust_crate_name)-test : $$(_rust_crate_test)
$$(RUSTC) $$(RUSTFLAGS) --dep-info --test $$< -o $$@
-include $$(patsubst %.rs,%.d,$$(_rust_crate_test))
endif
endef
If you wanted, you could add the crate’s target and the check target to the
all
and check
targets within this function, simplifying the main
Makefile
. You could also have it generate an appropriate clean-rust-geom
target as well.
It’s not going to win a beauty contest, but it will get the job done nicely.
In the next post, I plan to show the same example, but using CMake.
]]>In this post, I want to cover what the current issues are with building Rust code, especially with regards to external tooling. I’ll also describe some recent work I did to address these issues. In the future, I want to cover specific ways to integrate Rust with a few different build tools.
Building Rust with existing build tools is a little difficult at the moment. The main issues are related to Rust’s attempt to be a better systems language than the existing options.
For example, Rust uses a larger compilation unit than C and C++ compilers, and existing build tools are designed around single file compilation. Rust libraries are output with unpredictable names. And dependency information must be done manually.
Many programming languages compile one source file to one output file and then
collect the results into some final product. In C, you compile .c
files to
.o
files, then archive or link them into .lib
, .a
, .dylib
, and so on
depending on the platform and whether you are building an executable, static
library, or shared library. Even Java compiles .java
inputs to one or more
.class
outputs, which are then normally packaged into a .jar
.
In Rust, the unit of compilation is the crate, which is a collection of modules and items. A crate may consist of a single source file or an arbitrary number of them in some directory hierarchy, but its output is a single executable or library.
Using crates as the compilation unit makes sense from a compiler point of view, as it has more knowledge during compilation to work from. It also makes sense from a versioning point of view as all of the crate’s contents goes together. Using crates as the compilation unit allows for cyclic dependencies between modules in the same crates, which is useful to express some things. It also means that separate declaration and implementation pieces are not needed, such as the header files in C and C++.
Most build tools assume a model similar to that of a typical C compiler. For example, make has pattern rules that can take and input to and output based on on filename transformations. These work great if one input produces one output, but they don’t work well in other cases.
Rust still has a main input file, the one you pass to the compiler, so this difference doesn’t have a lot of ramifications when using existing build tools.
Compilers generally have an option for what to name their output files, or
else they derive the output name with some simple formula. C compilers use the
-o
option to name the output; Java just names the files after the classes
they contain. Rust also has a -o
option, which works like you expect, except
in the case of libraries where it is ignored.
Libraries in Rust are special in order to avoid naming collisions. Since libraries often end up stored centrally, only one library can have a given name. If I create a library called libgeom it will conflict with someone else’s libgeom. Operating systems and distributions end up resolving these conflicts by changing the names slightly, but it’s a huge annoyance. To avoid collisions, Rust includes a unique identifier called the crate hash in the name. Now my Rust library libgeom-f32ab99 doesn’t conflict with libgeom-00a9edc.
Unfortunately, the current Rust compiler computes the crate hash by hashing the link metadata, such as name and version, along with the link metadata of its dependencies. This results in a crate hash that only the Rust compiler is realistically able to compute, making it seem pseudo-random. This causes a huge problem for build tooling as the output filename for libraries in unknown.
To work around this problem when using make, the Rust and Servo build systems
use a dummy target called libfoo.dummy
for a library called foo, and after
running rustc
to build the library, it creates the libfoo.dummy
file so
that make has some well known output to reason about. This workaround is a bit
messy and pollutes the build files.
Here’s an
example
of what a Makefile
looks like with this .dummy
workaround:
RUSTC ?= rustc
SOURCES = $(find . -name '*.rs')
all: librust-geom.dummy
librust-geom.dummy: lib.rs $(SOURCES)
@$(RUSTC) --lib $<
@touch $@
clean:
@rm -f *.dummy *.so *.dylib *.dll
While this works, it also has some drawbacks. For example, if you edit a file
during a long compile, the libfoo.dummy
will get updated after the compile
is finished, and rerunning the build won’t detect any changes. The timestamp
of the input file will be older than the final output file that the build tool
is checking. If the build system knew the real output file name, it could
compare the correct timestamps, but that information has been locked inside
the Rust compiler.
Build systems need to be reliable. When you edit a file, it should trigger the correct things to get rebuilt. If nothing changes, nothing should get rebuilt. It’s extremely frustrating if you edit a file, rebuild the library, and find that your code changes aren’t reflected in the new output for some reason or that the library is not rebuilt at all. Reliable builds need accurate dependency information in order to accomplish this.
There’s currently no way for external build tools to get dependency information about Rust crates. This means that developers tend to list dependencies by hand which is pretty fragile.
One quick way to approximate dependency info is just to recursively find every
*.rs
in the crate’s source directory. This can be wrong for multiple reasons;
perhaps the include!
or include_str!
macros are used to pull in files that
aren’t named *.rs
or conditional compilation may omit several files.
This is similar to dealing with header dependencies by hand when working with C and C++ code. C compilers have options to generate dependency info to deal with this, which used by tools like CMake.
The price of inaccurate or missing dependency info is an unreliable build and
a frustrated developer. If you find yourself reaching for make clean
, you’re
probably suffering from this.
It’s possible to solve these problems without sacrificing the things we want and falling back to doing exactly what C compilers do. By making the output file knowable and handling dependencies automatically we make make build tool integration easy and the resulting builds reliable. This is exactly what I’ve been working on the last few weeks.
The first thing we need is to make the crate hash stable and easily computable
by external tools. Internally, the Rust compiler uses
SipHash to compute the crate hash, and takes
into account arbitrary link metadata as well as the link metadata of its
dependencies. SipHash is not something easily computed from a Makefile
and
the link metadata is not so easy to slurp and normalize from some dependency
graph.
I’ve just landed a pull request
that replaces the link metadata with a package identifier, which is a crate
level attribute called pkgid
. You declare it like
#[pkgid="github.com/mozilla-servo/rust-geom#0.1"];
at the top of your
lib.rs
. The first part, github.com/mozilla-servo
, is a path, which serves
as both a namespace for your crate and a location hint as to where it can be
obtained (for use by rustpkg for example). Then comes the crate’s name,
rust-geom
. Following that is the version identifier 0.1
. If no pkgid
attribute is provided, one is inferred with an empty path, a 0.0 version, and
a name based on the name of the input file.
To generate a crate hash, we take the SHA256 digest of the pkgid
attribute. SHA256 is readily available in most languages or on the command
line, and the pkgid
attribute is very easy to find by running a regular
expression over the main input file. The first eight digits of this hash are
used for the filename, but the full hash is stored in the crate metadata and
used as part of the symbol hashes.
Since the crate hash no longer depends on the crate’s dependencies, it is
stable so long as the pkgid
attribute doesn’t change. This should happen
very infrequently, for instance when the library changes versions.
This makes the crate hash computable by pretty much any build tool you can find, and means rustc generates predictable output filenames for libraries.
I’ve also got a pull request,
which should land soon, to enable rustc to output make-compatible dependency
information similar to the -MMD
flag of gcc. To use it, you give rustc the
--dep-info
option and for an input file of lib.rs
it will create a lib.d
which can be used by make or other tools to learn the true dependencies.
The lib.d
file will look something like this:
librust-geom-da91df73-0.0.dylib: lib.rs matrix.rs matrix2d.rs point.rs rect.rs side_offsets.rs size.rs
Note that this list of dependencies will include code pulled in via the
include!
and include_str!
macros as well.
Here’s an
example
of a handwritten Makefile
using dependency info. Note that this uses a
hard-coded output file name, which works because crate hash is stable unless
the pkgid
attribute is changed:
RUSTC ?= rustc
all: librust-geom-851fed20-0.1.dylib
librust-geom-851fed20-0.1.dylib: lib.rs
@$(RUSTC) --dep-info --lib $<
-include lib.d
Now it will notice when you change any of the .rs
files without needed to
explicitly list them, and this will get updated as your code changes
automatically. A little Makefile
abstraction on top of this can make it
quite nice and portable.
In the next few posts, I’ll show examples of integrating the improved Rust compiler with some existing build systems like make, CMake, and tup.
(Update: the next post covers building Rust with Make.)
]]>I hope you enjoy it!
]]>The Servo team welcomes and encourages new contributors and I’ll note particular projects where new contributors can easily get involved below. These aren’t the only places you can help, of course, but I thought it might be useful to know a few good places to start.
Patrick Walton has landed the beginnings of
navigation and scrolling support. The
rust-alert library provides
simple popup dialog support, and using this, you can now hit Ctrl-L
to bring
up a dialog to enter a new
URL. rust-glut also got keyboard
handler support. Note that this only works on Mac OS X right now due to
missing support for Linux in rust-alert.
Scrolling is another important UI feature, and you can now pan the content in the window. Servo does not currently draw new parts of the content that were hidden, but that should be simple to add.
For new contributors: If you’re looking to get started hacking on Servo or just want to learn more about Rust, adding popup dialogs on Linux to rust-alert would be a good project. Adding drawing of previously hidden areas to the scrolling code should also be an easy project for someone.
Eric Atkinson, one of Servo’s new interns, has just landed his first pull
request, adding the first bits of CSS’s text-decoration
support for
underline
.
For new contributors: Eric didn’t know any Rust or anything about Servo
internals before he started last week. It doesn’t take much to get started,
and there is lots of low hanging fruit to pick on the Servo tree. For example,
based on Eric’s underline
work, it should be fairly easy to add
strike-through
.
Tim Kuehn, another of Servo’s new interns, has also been busy his first week. He started overhauling how performance data is collected in Servo. Instead of simply timing bits of code and output the results to the console, there is now a separate task that handles performance data.
For new contributors: We’re not yet doing anything with this data yet, but we should be. It should be a pretty easy project to start outputting it more systematically and doing something with the results. Another idea would just to be report numbers for different platforms and compare them to similar numbers from other browsers so we know where we should improve.
The first parts of GPU accelerated rendering have started to land in Servo, specifically updates to Skia and Azure to support framebuffer-backed draw targets. These framebuffers render to textures which are shared with the GPU-based compositor. This avoids needing to render to CPU memory and then upload textures to the GPU. There is still a bug or two to work out with tiling support, but I expect GPU rendering to land in the tree pretty soon.
Servo now has continuous integration via Bors, the wonderful CI bot that the Rust team has already been using for some time. Not only that, but Servo’s Bors is now running on Mozilla’s release engineering infrastructure instead of being hosted by the Rust team. This should keep the tree building cleanly from now on. If you’ve previously had trouble compiling Servo, now would be a good time to try again.
Patrick Walton has been heavily refactoring Servo’s directory layout and many
of its subsystems. util
and net
libraries were split out from the gfx
library, and compositing was made quite a bit simpler. He has also refactored
layout and is working on splitting Servo into more libraries, which make it
both easier to understand and build faster. Much documentation has been added
in these refactorings.
Samsung continues to work on Android support, improving the Rust compiler along the way. That work should land in the tree in the near future.
Give all these new things a try and report any issues you find. The team hangs
out in #servo
on irc.mozilla.org and is happy to answer questions or help
you get started hacking on Servo.
When I arrived, Servo no longer built at all, at least not on OS X. Servo often requires bleeding edge versions of Rust, and backwards incompatible changes to Rust are still happening on a regular basis. Since all of the contributors to Rust work on different platforms, when porting to a new Rust compiler, some platforms have gotten left behind. This was particularly acute this time because Rust 0.6 contained a lot of syntax changes, mostly things that got removed from the language, and many pieces of Servo were using syntax that was deprecated in Rust 0.5, and was finally deleted entirely in Rust 0.6.
Rust 0.6 removed a lot of keywords and syntax from the language. Porting Servo required modifying all the constants, many function declarations, many import statements, etc. These changes were largely mechanical. There were a few changes that weren’t so easy.
Mutable fields are being removed from the language, and mutability will be controlled by the mutability of the struct itself. Not all of these had to be removed in Servo, but many of them did, and removing them often required slightly changing the data structures and their type signatures. In some cases this was trivial, but in a few cases these changes needed more care. In particular, lots of these changes bumped up against the Rust borrow checker, which ensures it’s safe to hand out pointers to memory. There are still some bugs in the borrow check, and workarounds are not always straightforward.
It took me about a week and a half to work my way through all the dependent libraries and Servo itself at which point I had a build. By the end of that second week I had landed the language upgrade to servo as well as some Rust library changes that were needed. The end result is that Servo is now using Rust 0.6 syntax, but it requires a post-0.6 version of Rust due to the Rust changes not landing quite in time for the 0.6 release.
Servo uses many forms of parallelism, but one bit of low hanging fruit is to move to a fully GPU rendering path. Currently compositing is done on the GPU, but rendering to the various layers is done on the CPU. This is how most current browsers operate as well.
We’re moving to rendering on the GPU as well which should speed up some things a bit. Instead of rendering in parallel to several layers, Servo will render directly into textures on the GPU which the compositor can use without doing CPU to GPU memory transfers.
This required upgrading the rendering stack to a newer version of Azure (Mozilla’s drawing library) and a new version of Skia (the specific backend that Azure uses on OS X, Linux, and Android). Now that this part is done, we’ll be adding texture layers to the renderer and switching drawing to those.
We’re setting up build and testing automation for Servo now, which should help ensure Servo remains buildable on all platforms. Rust has an amazing set of tools for this already, which we are hoping to reuse fully. Buildbot machines run builds and tests, and a GitHub bot called Bors handles dispatching builds for patches that have been reviewed and merging pull requests that have passed tests.
For now this work will be on Linux, but we hope to expand it to cover OS X and Android as well in the near future. Once Servo is a little farther along, we plan to put up nightly snapshots so more people can follow along with our progress.
There’s tons of other work in progress on both Servo and Rust. The DOM bindings are getting improved, a new Rust scheduler that will make performance and I/O better is in progress, a more optimized C FFI in Rust should also land soon, and the rustpkg package manager is shaping up which we’ll be switching to for more and more of Servo as it matures.
We need more help in lots of areas. Please join us in IRC in #servo or on the mailing list. We’ll be trying to mark bugs and projects that are well suited for new contributors. If you want to work on Servo and write Rust code all the time, we’re hiring.
]]>This will also be the first time in over a decade that I’m not working in a small company or a startup (usually both). I’ve been thinking for a while that it would be nice to work for a company that has real resources to solve problems, as opposed to being at the mercy of venture capitalists or the whims of users. Mozilla’s mission statement is one that is easy for me to get behind, and they are doing very interesting things at Mozilla Research.
I enjoy working on difficult and important projects, and it’s hard for me to imagine much that is more difficult or important than web browsers. It’s an added bonus to be working in and (hopefully) contributing to a new programming language. I also love working with smart people, and Mozilla seems to have those in abundance.
This is going to be awesome.
]]>Watch below, or on Xiph.org.
There is also a detailed write up.
]]>Instant answers on DuckDuckGo are really nice, in that they highlight a specific result in a context sensitive way. For example, Stack Overflow questions that match will show up with the highest rated answer at the top of the page, and Wikipedia articles will be presented as a title an abstract.
I decided to play with DuckDuckHack and added a plugin for XMPP Extension Proposal (XEP) lookups. I do XEP lookups often when I’m answering people’s XMPP-related questions, and this plugin makes the XEPs title and abstract appear as an instant answer. The plugin was recently merged into the tree, and it is now live on DuckDuckGo itself.
Try it by searching for XEP 45. View the code on GitHub.
]]>“Google crawls us at a rate of 1300 hits per second… They’ve indexed 3 billion of our pages,” Costolo said. “They have all the data they need.”
There’s no doubt that 1,300 hits per second is a large number, but let’s put that in perspective:
For part of 2010, Google was perhaps able to keep up with the stream at 1,300 requests per second. Somewhere between February and June, the average volume of tweets outpaced them.
Let’s assume that they kept pace until June 2011, and that on June 1, Twitter went from somewhere in the range of 1,300 tweets per second to their reported 2,300 tweets per second. Google is 1,000 tweets behind per second.
By the end of the year, Google missed 15.5 billion tweets. They are two months behind if they didn’t skip any, and the tweet volume did not increase. But it did increase by 25% or so by October, and surely it has grown more since then.
If Google has only indexed 3 billion pages so far, they have approximately 12 days of tweets at current volume. It’s pretty hard to rationalize the 3 billion pages number against the 1,300 per second number. Was Google indexing at a much slower rate before? Did they not start until a few months ago?
Of course Google may be getting multiple tweets per request, perhaps by crawling the timelines of important users. But this means that they probably get a lot of requests that don’t give them any new tweets, or else the timeliness of the data is poor.
No matter how you slice it, it appears Google would be unable to keep up. Even if they were keeping up now, Twitter’s growth probably sets a time limit for which keeping up remains possible.
Perhaps Google is super clever, and can index only the right tweets. I think that it’s more probable they have “enough” data to surface results for the super popular topics, and miss nearly everything in the long tail of the distribution. I expect that this adversely affects search quality, which one suspects is a high priority for the world’s best search engine.
Google is no saint. They are just as guilty of the same data hoarding. If you ran these numbers for YouTube indexing, I think you will find the situation is much worse. I imagine that most of these data silo companies purposefully set their crawl rates too low for anyone to achieve high quality search results.
In the case of Twitter, the end result for users is even worse because Twitter’s own attempts at search are terrible and are getting worse over time. At least Google makes a decent YouTube search, even if no one else can.
Even if Google could get all the tweets, they still would have very little to no Facebook data. I still think the best strategy in this situation for them is to create their own social data and use that instead. It’s a tough road, but they seem to have little choice.
In the end, it’s not about Google or Twitter or Facebook, but the stifling of innovation and competition around data. We can only hope that some federated solution or some data-liberal company wins out in the end.
]]>I recently saw a recommendation for Soul of a New Machine, which tells the story of a team of engineers at Data General who built a new 32-bit computer in the late 1970s. The book is fascinating. Thirty year later, many of its descriptions of the project and the way the team worked and was treated could apply to any modern project.
The plot summary will no doubt sound familiar to you: A team of mostly young, mostly male engineers works grueling hours to build something amazing in too short an amount of time. They succeed, albeit a bit over their original schedule. Despite the project’s commercial success, the team is denied both recognition and financial rewards and many end up leaving the company. Almost all of them ultimately enjoyed it and would (and did) do it again.
There were many pieces of this story that resonated with me.
:EXTENDED:
On overworking Tom West, the manager of the team in the book, says:
That’s the bear trap, the greatest vice. Your job. You can justify just about any behavior with it. Maybe that’s why you do it, so you don’t have to deal with all those other problems.
Why deal with the unpredictable world, when the controllable world of creation is available? It’s code as escapist drug, and I love to get high on it. Mundane things like cleaning my house, and more serious ones like taking care of my health, are all easy to avoid while fixing bugs or starting a new project.
It’s both possible and important to find a balance.
The team’s secretary, who was much more than her title suggests, suffered and succeeded with the rest of the team. Even she says:
I would do it again. I would be very grateful to do it again. I think I would take a cut in pay to do it again.
Even as I recover from projects that burned me out, I am constantly thinking about how to do new ones. In fact, while I’m doing any project, I’m already thinking about doing another. This sounds like drugs again. But they are good drugs.
The book describes how some team members tormented the lone female engineer. This is something that still happens today, and it’s terrible. And people then wonder why there are so few women in our industry.
In addition to that, at the end when they hand out the peer awards, their award to the woman was for putting up with them, not for any of her actual accomplishments.
Betty Shanahan was that lone woman, and it looks to me that she deserved more than just an award for thick skin. She’s the CEO of the Society of Women Engineers, and she was “a member of the design team for the first parallel processing minicomputer and manager of hardware design for subsequent systems.” She later moved to the business side of technology, and I wonder if that had anything to do with her having to put up with the Eagle team’s harassment.
Often we judge things by their properties, but one can also rightly judge something by how it is made. Shoes made from child labor are less good than those made in other ways.
Kidder, the book’s author, discusses this:
In The Nature of the Gothic John Ruskin decries the tendency of the industrial age to fragment work into tasks so trivial that they are fit to be performed ony by the equivalent of slave labor. Writing in the nineteeth century, Ruskin was one of the first, with Marx, to have raised this now-familiar complaint. In the Gothic cathedrals of Europe, Ruskin believed, you can see the glorious fruits of free labor given freely. What is usually meant by the term craftsmanship is the production of things of high quality; Ruskin makes the crucial point that a thing may also be judged according to the conditions under which it was built.
By this kind of measure, is the work many teams do good? Is the Eagle computer that Tom West’s team built really a success since the team worked much overtime, suffered divorces and other problems, and in the end received little to no reward?
I think it’s time for entrepreneurs and workers in our industry to demand better. Our outputs will be better if they are made sustainably, and not just by the measure above. In retrospect, maybe the reviewers of LA Noire should have taken into the account the trials of its developers; it certainly would not have fared well.
I want to hire resourceful people. I want to describe a general outline of a design and not have to describe it in intricate detail in order for them to build it.
It turns out that this is critical for happiness. If we’re told exactly how to do something, it takes much of the creativity and fun out of the work.
Engineers are supposed to stand among the privileged members of industrial enterprises, but several studies suggest that a fairly large percentage of engineers in America are not content with their jobs. Among the reasons cited are the nature of the jobs themselves and the restrictive way sin which they are managed. Among the terms used to describe their malaise are declining technical challenge; misutilization; limited freedom of action; tight control of working conditions.
You must trust those you work with to be resourceful. If you don’t trust them, you will end up micromanaging them into unhappiness, and you will also remove their valuable creative input from your product.
There is a balance to be struck with feedback. The Eagle engineers thought that the managers didn’t appreciate their efforts, but in reality, some of this was them trying to stay out of the way. Kidder asked the Tom West’s boss:
Had the Eagle project always interested him or had it grown in importance gradually?
“From the start it was a very important project.”
Was he pleased with the work of the Eclipse group?
“Absolutely!” His voice falls. “They did a hell of a job.”
But some members of the team felt that they had been rather neglected by the company.
“That doesn’t surprise me,” he says. “That’s frequently the case. There’s often a conflict in people’s minds. How much direction do they want?”
I’ve had this same issue with investors as well. You don’t want them to meddle with your company or your product, but you also want their advice and guidance. It’s possible to go too far in either direction, but mostly you hear about stories where investors meddle too much. I personally think it’s probably better to err on the side of too little help than to end up with too much meddling.
Even thirty years ago, the VCs had a bad rap. Tom West was asked in a Wired article years after the book’s publishing why he stayed at Data General until he retired:
“You could do new products and companies within the company, rather than shag some venture capitalist and kill yourself for five years.” To be an entrepreneur, he says, “you have to be interested in networking, even with fools.”
This is another reason why I would prefer to bootstrap companies if at all possible.
Tom West ended up working on many interesting projects at Data General, but ultimately, none of them got the support or recognition they deserved. The other members of the Eagle team spread out and started or worked for new companies, and in general seemed much happier.
In the end, it’s both a fascinating tale of heroism and creativity and a saddening tale of undervalued and underpaid engineers. I am both emboldened to keep following my passions and more mindful of its dangers. My troubles are not unique - not even modern. Thirty years after this book was written, I feel like it could have been written yesterday.
]]>Twitter’s argument:
We’re concerned that as a result of Google’s changes, finding this information will be much harder for everyone. We think that’s bad for people, publishers, news organizations and Twitter users.
Google’s response was:
We are a bit surprised by Twitter’s comments about Search plus Your World, because they chose not to renew their agreement with us last summer (http://goo.gl/chKwi), and since then we have observed their rel=nofollow instructions.
People have been digging into the semantics of nofollow (see Danny Sullivan and Luigi Montanez), but there is a much bigger issue.
Google and other established and up-and-coming search engines have no real way to include lots of data in their index. It’s easy to imagine that the lack of access to Twitter and Facebook data was a motivator for Google+ in the first place.
Lots of sites now generate enough data that it is unrealistic to crawl them. For example, Youtube has more new content every day than they allow anyone to crawl. Twitter is essentially the same. This means there is no way to index this data without special arrangements with the provider. Twitter has closely guarded their firehose of data, but at least they have some mechanism to obtain it. Youtube, as far as I am aware, has no such mechanism.
My team and I ran into this problem head on trying to build Collecta, a real-time search engine. Access to the data was a primary blocker for many features and product ideas, and over the too short life of that company, access became significantly more difficult, not easier.
Google can build an effective search, even a real-time one, for Youtube, but no one else can. Twitter can build search for their data, but few others can, and their data access policies can and do change on a whim.
If Google believes that microblogging data will improve their search product, then a reasonable strategy to obtain that data is to try and build their own microblogging service to generate it. I can’t fault Google for trying. If I thought Collecta could have effectively competed against Twitter for their audience, I would certainly have attempted that as well.
Google, Twitter, Facebook and others are hoarding silos of otherwise public data. Not only is this artificially limiting the features of their products, but it squashes the potential for new and exciting search applications. The search services that have sprung up are limited to your own data, aggregate results from service-specific search APIs, exist at the mercy of data providers, or make do with a tiny subset of the data. I don’t think Google could have built their own search engine if the Web were similarly hostile.
One could argue for requiring these bits of data to be openly available, but unlike the data of the past, this data is expensive to publish and consume. Most of these services may not even have a mechanism to publish the data, even internally. Simply receiving the Youtube or Twitter firehoses (and not counting video or image media) would require significant engineering effort, and the rate of data generation is only accelerating.
I think we must push for open access to data, even if it is costly. These data wars benefit very few. If things don’t change, the future of search is dark.
]]>Please consider upgrading as soon as possible, as a security problem was found in Strophe.js 1.0.1. The DIGEST-MD5 SASL method used a constant client nonce due to a bug in Strophe’s use of the underlying MD5 library. I don’t know of any exploits for this bug, but it could compromise your site’s security.
Much of the credit for this release goes to the many contributions and pull requests that people have sent in the last year. The community’s effort continues to make Strophe.js better and better.
]]>