metajack.im

Servo Talk at LCA 2017

2017-01-18T00:00:00Z

My talk from Linux.conf.au was just posted, and you can go watch it. In it I cover some of the features of Servo that make it unique and fast, including the constellation and WebRender.

Servo Architecture: Safety & Performance by Jack Moffitt, LCA 2017, Hobart, Australia.

Servo Interview on The Changelog

2016-11-21T00:00:00Z

The Changelog has just published an episode about Servo. It covers the motivations and goals of the project, some aspects of Servo performance and use of the Rust language, and even has a bit about our wonderful community. If your curious about why Servo exists, how we plan to ship it to real users, or what it was like to use Rust before it was stable, I recommend giving it a listen.

The Changelog 228: Servo and Rust with Jack Moffitt – Listen on Changelog.com

Building Rust Code - Using Make Part 2

2013-12-19T00:00:00Z

This series of posts is about building Rust code. In the last post I showed some nice abstractions for using Make with Rust. Today I’ll show an improved version of this integration.

New Rust Compiler Flags

After landing the pkgid attribute work there was much community discussion about how that feature could be improved. The net result was:

pkgid was renamed to crate_id it’s being used to identify a crate and not a package, which is a grouping of crates. Actually, a package is still a pretty fluid concept in Rust right now.
The crate_id attribute can now override the inferred name of the crate with new syntax. A crate_id of github.com/foo/rust-bar#bar:1.0 names the crate bar which can be found at github.com/foo/rust-bar. Previously the crate name was inferred to be the last component of the path, rust-bar.
The compiler has several new flags to print out this information, saving tooling the bother of parsing it out and computing crate hashes itself. You can use --crate-id, --crate-name, and --crate-file-name so get the value of the crate_id attribute, the crate’s name, and output filenames the compiler will produce.

These changes made a good thing even better.

Magical Makefiles Version 2

The Makefile hasn’t changed much, but here is a much simpler rust.mk that the new compiler flags enable:

define RUST_CRATE

_rust_crate_dir = $(dir $(1))
_rust_crate_lib = $$(_rust_crate_dir)lib.rs
_rust_crate_test = $$(_rust_crate_dir)test.rs

_rust_crate_name = $$(shell $(RUSTC) --crate-name $$(_rust_crate_lib))
_rust_crate_dylib = $$(shell $(RUSTC) --crate-file-name --lib $$(_rust_crate_lib))

.PHONY : $$(_rust_crate_name)
$$(_rust_crate_name) : $$(_rust_crate_dylib)

$$(_rust_crate_dylib) : $$(_rust_crate_lib)
$$(RUSTC) $$(RUSTFLAGS) --dep-info --lib $$<

-include $$(patsubst %.rs,%.d,$$(_rust_crate_lib))

ifneq ($$(wildcard $$(_rust_crate_test)),"")

.PHONY : check-$$(_rust_crate_name)
check-$$(_rust_crate_name): $$(_rust_crate_name)-test
    ./$$(_rust_crate_name)-test

$$(_rust_crate_name)-test : $$(_rust_crate_test)
    $$(RUSTC) $$(RUSTFLAGS) --dep-info --test $$< -o $$@

-include $$(patsubst %.rs,%.d,$$(_rust_crate_test))

endif

endef

No more nasty sed scripts necessary, but of course, the crate hash is still computable if you want to do it yourself for some reason.

Building Rust Code - Using Make

2013-12-12T00:00:00Z

This series of posts is about building Rust code. In the first post I covered the current issues (and my solutions) around building Rust using external tooling. This post will cover using Make to build Rust projects.

The Example Crate

For this post, I’m going to use the rust-geom library as an example. It is a simple Rust library used by Servo to handle common geometric tasks like dealing with points, rectangles, and matrices. It is pure Rust code, has no dependencies, and includes some unit tests.

We want to build a dynamic library and the test suite, and the Makefile should be able to run the test suite by using make check. As much as possible, we’ll use the same crate structure that rustpkg uses so that once rustpkg is ready for real use, the transition to it will be painless.

Makefile Abstractions

Did you know that Makefiles can define functions? It’s a little clumsy, but it works and you can abstract a bunch of the tedium away. I’d never really noticed them before dealing with the Rust and Servo build systems, which use them heavily.

By using shell commands like shasum and sed, we can compute crate hashes, and by using Make’s eval function, we can dynamically define new targets. I’ve created a rust.mk which can be included in a Makefile that makes it really easy to build Rust crates.

Magical Makefiles

Let’s look at a Makefile for rust-geom which uses rust.mk.

include rust.mk

RUSTC ?= rustc
RUSTFLAGS ?=

.PHONY : all
all: rust-geom

.PHONY : check
check: check-rust-geom

$(eval $(call RUST_CRATE, .))

It includes rust.mk, sets up some basic variables that control the compiler and flags, and then defines the top level targets. The magic bit comes the call to RUST_CRATE which takes a path to where a crate’s lib.rs and test.rs are located. In this case the path is the current directory, ..

RUST_CRATE finds the pkgid attribute in the crate and uses this to compute the crate’s name, hash, and the output filename for the library. It then creates a target with the same name as the crate name, in this case rust-geom, and a target for the output file for the library. It uses the Rust compiler’s support for dependency information so that it will know exactly when it needs to recompile things.

If the crate contains a test.rs file, it will also create a target that compiles the tests for the crates into an executable as well as a target to run the tests. The executable will be named after the crate; for rust-geom it will be named rust-geom-test. The check target is also named after the crate, check-rust-geom.

The files lib.rs and test.rs are the files rustpkg itself uses by default. This Makefile does not support the pkg.rs custom build logic, but if you need custom logic, it is easy enough to modify this example. One benefit of following in rustpkg’s footsteps here is that this same crate should be buildable with rustpkg without modification.

Behind the Scenes

rust.mk is a a little ugly, but not too bad. It defines a few helper functions like RUST_CRATE_PKGID and RUST_CRATE_HASH which are used by the main RUST_CRATE function. The syntax is a bit silly because of the use of eval and the need to escape $s, but it shouldn’t be too hard to follow if you’re already familiar with Make syntax.

RUST_CRATE_PKGID = $(shell sed -ne 's/^\#\[ *pkgid *= *"\(.*\)" *];$$/\1/p' $(firstword $(1)))
RUST_CRATE_PATH = $(shell printf $(1) | sed -ne 's/^\([^\#]*\)\/.*$$/\1/p')
RUST_CRATE_NAME = $(shell printf $(1) | sed -ne 's/^\([^\#]*\/\)\{0,1\}\([^\#]*\).*$$/\2/p')
RUST_CRATE_VERSION = $(shell printf $(1) | sed -ne 's/^[^\#]*\#\(.*\)$$/\1/p')
RUST_CRATE_HASH = $(shell printf $(strip $(1)) | shasum -a 256 | sed -ne 's/^\(.\{8\}\).*$$/\1/p')

ifeq ($(shell uname),Darwin)
RUST_DYLIB_EXT=dylib
else
RUST_DYLIB_EXT=so
endif

define RUST_CRATE

_rust_crate_dir = $(dir $(1))
_rust_crate_lib = $$(_rust_crate_dir)lib.rs
_rust_crate_test = $$(_rust_crate_dir)test.rs

_rust_crate_pkgid = $$(call RUST_CRATE_PKGID, $$(_rust_crate_lib))
_rust_crate_name = $$(call RUST_CRATE_NAME, $$(_rust_crate_pkgid))
_rust_crate_version = $$(call RUST_CRATE_VERSION, $$(_rust_crate_pkgid))
_rust_crate_hash = $$(call RUST_CRATE_HASH, $$(_rust_crate_pkgid))
_rust_crate_dylib = lib$$(_rust_crate_name)-$$(_rust_crate_hash)-$$(_rust_crate_version).$(RUST_DYLIB_EXT)

.PHONY : $$(_rust_crate_name)
$$(_rust_crate_name) : $$(_rust_crate_dylib)

$$(_rust_crate_dylib) : $$(_rust_crate_lib)
    $$(RUSTC) $$(RUSTFLAGS) --dep-info --lib $$<

-include $$(patsubst %.rs,%.d,$$(_rust_crate_lib))

ifneq ($$(wildcard $$(_rust_crate_test)),"")

.PHONY : check-$$(_rust_crate_name)
check-$$(_rust_crate_name): $$(_rust_crate_name)-test
    ./$$(_rust_crate_name)-test

$$(_rust_crate_name)-test : $$(_rust_crate_test)
    $$(RUSTC) $$(RUSTFLAGS) --dep-info --test $$< -o $$@

-include $$(patsubst %.rs,%.d,$$(_rust_crate_test))

endif

endef

If you wanted, you could add the crate’s target and the check target to the all and check targets within this function, simplifying the main Makefile. You could also have it generate an appropriate clean-rust-geom target as well.

It’s not going to win a beauty contest, but it will get the job done nicely.

Next Up

In the next post, I plan to show the same example, but using CMake.

Building Rust Code - Current Issues

2013-12-11T00:00:00Z

As rustpkg is still in its infancy, most Rust code tends to be built with make, other tools, or by hand. I’ve been working on updating Servo’s build system to something a bit more reliable and fast, and so I’ve been giving a lot of thought to build tooling with regards to Rust.

In this post, I want to cover what the current issues are with building Rust code, especially with regards to external tooling. I’ll also describe some recent work I did to address these issues. In the future, I want to cover specific ways to integrate Rust with a few different build tools.

Current Issues

Building Rust with existing build tools is a little difficult at the moment. The main issues are related to Rust’s attempt to be a better systems language than the existing options.

For example, Rust uses a larger compilation unit than C and C++ compilers, and existing build tools are designed around single file compilation. Rust libraries are output with unpredictable names. And dependency information must be done manually.

Compilation Unit

Many programming languages compile one source file to one output file and then collect the results into some final product. In C, you compile .c files to .o files, then archive or link them into .lib, .a, .dylib, and so on depending on the platform and whether you are building an executable, static library, or shared library. Even Java compiles .java inputs to one or more .class outputs, which are then normally packaged into a .jar.

In Rust, the unit of compilation is the crate, which is a collection of modules and items. A crate may consist of a single source file or an arbitrary number of them in some directory hierarchy, but its output is a single executable or library.

Using crates as the compilation unit makes sense from a compiler point of view, as it has more knowledge during compilation to work from. It also makes sense from a versioning point of view as all of the crate’s contents goes together. Using crates as the compilation unit allows for cyclic dependencies between modules in the same crates, which is useful to express some things. It also means that separate declaration and implementation pieces are not needed, such as the header files in C and C++.

Most build tools assume a model similar to that of a typical C compiler. For example, make has pattern rules that can take and input to and output based on on filename transformations. These work great if one input produces one output, but they don’t work well in other cases.

Rust still has a main input file, the one you pass to the compiler, so this difference doesn’t have a lot of ramifications when using existing build tools.

Output Names

Compilers generally have an option for what to name their output files, or else they derive the output name with some simple formula. C compilers use the -o option to name the output; Java just names the files after the classes they contain. Rust also has a -o option, which works like you expect, except in the case of libraries where it is ignored.

Libraries in Rust are special in order to avoid naming collisions. Since libraries often end up stored centrally, only one library can have a given name. If I create a library called libgeom it will conflict with someone else’s libgeom. Operating systems and distributions end up resolving these conflicts by changing the names slightly, but it’s a huge annoyance. To avoid collisions, Rust includes a unique identifier called the crate hash in the name. Now my Rust library libgeom-f32ab99 doesn’t conflict with libgeom-00a9edc.

Unfortunately, the current Rust compiler computes the crate hash by hashing the link metadata, such as name and version, along with the link metadata of its dependencies. This results in a crate hash that only the Rust compiler is realistically able to compute, making it seem pseudo-random. This causes a huge problem for build tooling as the output filename for libraries in unknown.

To work around this problem when using make, the Rust and Servo build systems use a dummy target called libfoo.dummy for a library called foo, and after running rustc to build the library, it creates the libfoo.dummy file so that make has some well known output to reason about. This workaround is a bit messy and pollutes the build files.

Here’s an example of what a Makefile looks like with this .dummy workaround:

RUSTC ?= rustc

SOURCES = $(find . -name '*.rs')

all: librust-geom.dummy

librust-geom.dummy: lib.rs $(SOURCES)
    @$(RUSTC) --lib $<
    @touch $@

clean:
    @rm -f *.dummy *.so *.dylib *.dll

While this works, it also has some drawbacks. For example, if you edit a file during a long compile, the libfoo.dummy will get updated after the compile is finished, and rerunning the build won’t detect any changes. The timestamp of the input file will be older than the final output file that the build tool is checking. If the build system knew the real output file name, it could compare the correct timestamps, but that information has been locked inside the Rust compiler.

Dependency Information

Build systems need to be reliable. When you edit a file, it should trigger the correct things to get rebuilt. If nothing changes, nothing should get rebuilt. It’s extremely frustrating if you edit a file, rebuild the library, and find that your code changes aren’t reflected in the new output for some reason or that the library is not rebuilt at all. Reliable builds need accurate dependency information in order to accomplish this.

There’s currently no way for external build tools to get dependency information about Rust crates. This means that developers tend to list dependencies by hand which is pretty fragile.

One quick way to approximate dependency info is just to recursively find every *.rs in the crate’s source directory. This can be wrong for multiple reasons; perhaps the include! or include_str! macros are used to pull in files that aren’t named *.rs or conditional compilation may omit several files.

This is similar to dealing with header dependencies by hand when working with C and C++ code. C compilers have options to generate dependency info to deal with this, which used by tools like CMake.

The price of inaccurate or missing dependency info is an unreliable build and a frustrated developer. If you find yourself reaching for make clean, you’re probably suffering from this.

Making It Better

It’s possible to solve these problems without sacrificing the things we want and falling back to doing exactly what C compilers do. By making the output file knowable and handling dependencies automatically we make make build tool integration easy and the resulting builds reliable. This is exactly what I’ve been working on the last few weeks.

Stable and Computable Hashes

The first thing we need is to make the crate hash stable and easily computable by external tools. Internally, the Rust compiler uses SipHash to compute the crate hash, and takes into account arbitrary link metadata as well as the link metadata of its dependencies. SipHash is not something easily computed from a Makefile and the link metadata is not so easy to slurp and normalize from some dependency graph.

I’ve just landed a pull request that replaces the link metadata with a package identifier, which is a crate level attribute called pkgid. You declare it like #[pkgid="github.com/mozilla-servo/rust-geom#0.1"]; at the top of your lib.rs. The first part, github.com/mozilla-servo, is a path, which serves as both a namespace for your crate and a location hint as to where it can be obtained (for use by rustpkg for example). Then comes the crate’s name, rust-geom. Following that is the version identifier 0.1. If no pkgid attribute is provided, one is inferred with an empty path, a 0.0 version, and a name based on the name of the input file.

To generate a crate hash, we take the SHA256 digest of the pkgid attribute. SHA256 is readily available in most languages or on the command line, and the pkgid attribute is very easy to find by running a regular expression over the main input file. The first eight digits of this hash are used for the filename, but the full hash is stored in the crate metadata and used as part of the symbol hashes.

Since the crate hash no longer depends on the crate’s dependencies, it is stable so long as the pkgid attribute doesn’t change. This should happen very infrequently, for instance when the library changes versions.

This makes the crate hash computable by pretty much any build tool you can find, and means rustc generates predictable output filenames for libraries.

Dependency Management

I’ve also got a pull request, which should land soon, to enable rustc to output make-compatible dependency information similar to the -MMD flag of gcc. To use it, you give rustc the --dep-info option and for an input file of lib.rs it will create a lib.d which can be used by make or other tools to learn the true dependencies.

The lib.d file will look something like this:

librust-geom-da91df73-0.0.dylib: lib.rs matrix.rs matrix2d.rs point.rs rect.rs side_offsets.rs size.rs

Note that this list of dependencies will include code pulled in via the include! and include_str! macros as well.

Here’s an example of a handwritten Makefile using dependency info. Note that this uses a hard-coded output file name, which works because crate hash is stable unless the pkgid attribute is changed:

RUSTC ?= rustc

all: librust-geom-851fed20-0.1.dylib

librust-geom-851fed20-0.1.dylib: lib.rs
    @$(RUSTC) --dep-info --lib $<

-include lib.d

Now it will notice when you change any of the .rs files without needed to explicitly list them, and this will get updated as your code changes automatically. A little Makefile abstraction on top of this can make it quite nice and portable.

Next Up

In the next few posts, I’ll show examples of integrating the improved Rust compiler with some existing build systems like make, CMake, and tup.

(Update: the next post covers building Rust with Make.)

Seven Web Frameworks in Seven Weeks in Beta

2013-08-21T00:00:00Z

I’m happy to announce that my new book Seven Web Frameworks in Seven Weeks: Adventures in Better Web Apps is now available in beta from Pragmatic Programmers. My co-author Fred Daoud and I cover a wide variety of frameworks in different styles and languages. The book covers Sinatra (Ruby), CanJS (JavaScript), AngularJS (JavaScript), Ring (Clojure), Webmachine (Erlang), Yesod (Haskell), and Immutant (Clojure). This first beta contains the first five of those.

I hope you enjoy it!

Servo Update: Navigation, Scrolling, GPU Rendering, Underlines, and more

2013-05-26T00:00:00Z

Servo is changing rapidly, and with two new interns joining the team the pace will only accelerate. The last few weeks have seen some big changes starting to land in the tree.

The Servo team welcomes and encourages new contributors and I’ll note particular projects where new contributors can easily get involved below. These aren’t the only places you can help, of course, but I thought it might be useful to know a few good places to start.

Patrick Walton has landed the beginnings of navigation and scrolling support. The rust-alert library provides simple popup dialog support, and using this, you can now hit Ctrl-L to bring up a dialog to enter a new URL. rust-glut also got keyboard handler support. Note that this only works on Mac OS X right now due to missing support for Linux in rust-alert.

Scrolling is another important UI feature, and you can now pan the content in the window. Servo does not currently draw new parts of the content that were hidden, but that should be simple to add.

For new contributors: If you’re looking to get started hacking on Servo or just want to learn more about Rust, adding popup dialogs on Linux to rust-alert would be a good project. Adding drawing of previously hidden areas to the scrolling code should also be an easy project for someone.

Underlined Text

Eric Atkinson, one of Servo’s new interns, has just landed his first pull request, adding the first bits of CSS’s text-decoration support for underline.

For new contributors: Eric didn’t know any Rust or anything about Servo internals before he started last week. It doesn’t take much to get started, and there is lots of low hanging fruit to pick on the Servo tree. For example, based on Eric’s underline work, it should be fairly easy to add strike-through.

Performance metrics

Tim Kuehn, another of Servo’s new interns, has also been busy his first week. He started overhauling how performance data is collected in Servo. Instead of simply timing bits of code and output the results to the console, there is now a separate task that handles performance data.

For new contributors: We’re not yet doing anything with this data yet, but we should be. It should be a pretty easy project to start outputting it more systematically and doing something with the results. Another idea would just to be report numbers for different platforms and compare them to similar numbers from other browsers so we know where we should improve.

GPU Rendering

The first parts of GPU accelerated rendering have started to land in Servo, specifically updates to Skia and Azure to support framebuffer-backed draw targets. These framebuffers render to textures which are shared with the GPU-based compositor. This avoids needing to render to CPU memory and then upload textures to the GPU. There is still a bug or two to work out with tiling support, but I expect GPU rendering to land in the tree pretty soon.

Miscellaneous

Servo now has continuous integration via Bors, the wonderful CI bot that the Rust team has already been using for some time. Not only that, but Servo’s Bors is now running on Mozilla’s release engineering infrastructure instead of being hosted by the Rust team. This should keep the tree building cleanly from now on. If you’ve previously had trouble compiling Servo, now would be a good time to try again.

Patrick Walton has been heavily refactoring Servo’s directory layout and many of its subsystems. util and net libraries were split out from the gfx library, and compositing was made quite a bit simpler. He has also refactored layout and is working on splitting Servo into more libraries, which make it both easier to understand and build faster. Much documentation has been added in these refactorings.

Samsung continues to work on Android support, improving the Rust compiler along the way. That work should land in the tree in the near future.

Give all these new things a try and report any issues you find. The team hangs out in #servo on irc.mozilla.org and is happy to answer questions or help you get started hacking on Servo.

Servo Update: Upgrading Rust, GPU Rendering, and Automation

2013-04-12T00:00:00Z

I’ve been working on Servo for three weeks now. There’s an enormous amount of work to do, and I want to capture what’s going on and how it’s progressing. This should be the first of many such updates on the project.

Day One

When I arrived, Servo no longer built at all, at least not on OS X. Servo often requires bleeding edge versions of Rust, and backwards incompatible changes to Rust are still happening on a regular basis. Since all of the contributors to Rust work on different platforms, when porting to a new Rust compiler, some platforms have gotten left behind. This was particularly acute this time because Rust 0.6 contained a lot of syntax changes, mostly things that got removed from the language, and many pieces of Servo were using syntax that was deprecated in Rust 0.5, and was finally deleted entirely in Rust 0.6.

Upgrading to Rust 0.6

Rust 0.6 removed a lot of keywords and syntax from the language. Porting Servo required modifying all the constants, many function declarations, many import statements, etc. These changes were largely mechanical. There were a few changes that weren’t so easy.

Mutable fields are being removed from the language, and mutability will be controlled by the mutability of the struct itself. Not all of these had to be removed in Servo, but many of them did, and removing them often required slightly changing the data structures and their type signatures. In some cases this was trivial, but in a few cases these changes needed more care. In particular, lots of these changes bumped up against the Rust borrow checker, which ensures it’s safe to hand out pointers to memory. There are still some bugs in the borrow check, and workarounds are not always straightforward.

It took me about a week and a half to work my way through all the dependent libraries and Servo itself at which point I had a build. By the end of that second week I had landed the language upgrade to servo as well as some Rust library changes that were needed. The end result is that Servo is now using Rust 0.6 syntax, but it requires a post-0.6 version of Rust due to the Rust changes not landing quite in time for the 0.6 release.

GPU Rendering

Servo uses many forms of parallelism, but one bit of low hanging fruit is to move to a fully GPU rendering path. Currently compositing is done on the GPU, but rendering to the various layers is done on the CPU. This is how most current browsers operate as well.

We’re moving to rendering on the GPU as well which should speed up some things a bit. Instead of rendering in parallel to several layers, Servo will render directly into textures on the GPU which the compositor can use without doing CPU to GPU memory transfers.

This required upgrading the rendering stack to a newer version of Azure (Mozilla’s drawing library) and a new version of Skia (the specific backend that Azure uses on OS X, Linux, and Android). Now that this part is done, we’ll be adding texture layers to the renderer and switching drawing to those.

Automation

We’re setting up build and testing automation for Servo now, which should help ensure Servo remains buildable on all platforms. Rust has an amazing set of tools for this already, which we are hoping to reuse fully. Buildbot machines run builds and tests, and a GitHub bot called Bors handles dispatching builds for patches that have been reviewed and merging pull requests that have passed tests.

For now this work will be on Linux, but we hope to expand it to cover OS X and Android as well in the near future. Once Servo is a little farther along, we plan to put up nightly snapshots so more people can follow along with our progress.

Other Work

There’s tons of other work in progress on both Servo and Rust. The DOM bindings are getting improved, a new Rust scheduler that will make performance and I/O better is in progress, a more optimized C FFI in Rust should also land soon, and the rustpkg package manager is shaping up which we’ll be switching to for more and more of Servo as it matures.

We need more help in lots of areas. Please join us in IRC in #servo or on the mailing list. We’ll be trying to mark bugs and projects that are well suited for new contributors. If you want to work on Servo and write Rust code all the time, we’re hiring.

Joining Mozilla

2013-03-22T00:00:00Z

On Monday, I join Mozilla to work on Servo, a new and experimental web browser engine built on Rust, a new systems programming language. I am perhaps the first professional Rust programmer.

This will also be the first time in over a decade that I’m not working in a small company or a startup (usually both). I’ve been thinking for a while that it would be nice to work for a company that has real resources to solve problems, as opposed to being at the mercy of venture capitalists or the whims of users. Mozilla’s mission statement is one that is easy for me to get behind, and they are doing very interesting things at Mozilla Research.

I enjoy working on difficult and important projects, and it’s hard for me to imagine much that is more difficult or important than web browsers. It’s an added bonus to be working in and (hopefully) contributing to a new programming language. I also love working with smart people, and Mozilla seems to have those in abundance.

This is going to be awesome.

Digital Audio and Sampling Explained

2013-02-26T00:00:00Z

Xiph.org has just posted the second in its series of videos on digital media concepts and techniques. It’s packed with information and demonstrations, and you’re sure to learn a huge amount. As an added bonus, it’s hosted by Monty, the creator of Ogg Vorbis (and many other amazing things). You couldn’t ask for a more qualified teacher.

Watch below, or on Xiph.org.

There is also a detailed write up.

XEP Lookups on DuckDuckGo

2012-07-09T00:00:00Z

The search engine DuckDuckGo recently added the ability for developers to develop instant answer plugins for their search engine. This project is called DuckDuckHack.

Instant answers on DuckDuckGo are really nice, in that they highlight a specific result in a context sensitive way. For example, Stack Overflow questions that match will show up with the highest rated answer at the top of the page, and Wikipedia articles will be presented as a title an abstract.

I decided to play with DuckDuckHack and added a plugin for XMPP Extension Proposal (XEP) lookups. I do XEP lookups often when I’m answering people’s XMPP-related questions, and this plugin makes the XEPs title and abstract appear as an instant answer. The plugin was recently merged into the tree, and it is now live on DuckDuckGo itself.

Try it by searching for XEP 45. View the code on GitHub.

The Numbers Behind the Twitter Data Silo

2012-01-30T00:00:00Z

The dark future of search is being foreshadowed by this Twitter vs. Google fight. The latest Twitter volley at Google is this quote (seen on GigaOm) from Twitter CEO Dick Costolo:

“Google crawls us at a rate of 1300 hits per second… They’ve indexed 3 billion of our pages,” Costolo said. “They have all the data they need.”

There’s no doubt that 1,300 hits per second is a large number, but let’s put that in perspective:

In February 2010, Twitter was at 50 million tweets per day. This is just under 600 tweets per second.
In June 2011, Twitter was at 200 million tweets per day. This is over 2,300 per second.
In October 2011, Twitter hit 250 million tweets per day or just under 3,000 per second.
They have spikes of over 7,000 tweets per second, with the largest (so far) being just over 25,000 tweets per second.

For part of 2010, Google was perhaps able to keep up with the stream at 1,300 requests per second. Somewhere between February and June, the average volume of tweets outpaced them.

Let’s assume that they kept pace until June 2011, and that on June 1, Twitter went from somewhere in the range of 1,300 tweets per second to their reported 2,300 tweets per second. Google is 1,000 tweets behind per second.

By the end of the year, Google missed 15.5 billion tweets. They are two months behind if they didn’t skip any, and the tweet volume did not increase. But it did increase by 25% or so by October, and surely it has grown more since then.

If Google has only indexed 3 billion pages so far, they have approximately 12 days of tweets at current volume. It’s pretty hard to rationalize the 3 billion pages number against the 1,300 per second number. Was Google indexing at a much slower rate before? Did they not start until a few months ago?

Of course Google may be getting multiple tweets per request, perhaps by crawling the timelines of important users. But this means that they probably get a lot of requests that don’t give them any new tweets, or else the timeliness of the data is poor.

No matter how you slice it, it appears Google would be unable to keep up. Even if they were keeping up now, Twitter’s growth probably sets a time limit for which keeping up remains possible.

Perhaps Google is super clever, and can index only the right tweets. I think that it’s more probable they have “enough” data to surface results for the super popular topics, and miss nearly everything in the long tail of the distribution. I expect that this adversely affects search quality, which one suspects is a high priority for the world’s best search engine.

Google is no saint. They are just as guilty of the same data hoarding. If you ran these numbers for YouTube indexing, I think you will find the situation is much worse. I imagine that most of these data silo companies purposefully set their crawl rates too low for anyone to achieve high quality search results.

In the case of Twitter, the end result for users is even worse because Twitter’s own attempts at search are terrible and are getting worse over time. At least Google makes a decent YouTube search, even if no one else can.

Even if Google could get all the tweets, they still would have very little to no Facebook data. I still think the best strategy in this situation for them is to create their own social data and use that instead. It’s a tough road, but they seem to have little choice.

In the end, it’s not about Google or Twitter or Facebook, but the stifling of innovation and competition around data. We can only hope that some federated solution or some data-liberal company wins out in the end.

The More Things Change: A Review of The Soul of a New Machine

2012-01-20T00:00:00Z

Already in my career I’ve experienced enormous passion, burnout, extraordinary dedication to my team and projects, and depression. I’m sure many others have as well. Has it always been this way with technology? I often wonder if this rollercoaster is necessary, healthy, or normal.

I recently saw a recommendation for Soul of a New Machine, which tells the story of a team of engineers at Data General who built a new 32-bit computer in the late 1970s. The book is fascinating. Thirty year later, many of its descriptions of the project and the way the team worked and was treated could apply to any modern project.

The plot summary will no doubt sound familiar to you: A team of mostly young, mostly male engineers works grueling hours to build something amazing in too short an amount of time. They succeed, albeit a bit over their original schedule. Despite the project’s commercial success, the team is denied both recognition and financial rewards and many end up leaving the company. Almost all of them ultimately enjoyed it and would (and did) do it again.

There were many pieces of this story that resonated with me.

:EXTENDED:

Work is a Drug

On overworking Tom West, the manager of the team in the book, says:

That’s the bear trap, the greatest vice. Your job. You can justify just about any behavior with it. Maybe that’s why you do it, so you don’t have to deal with all those other problems.

Why deal with the unpredictable world, when the controllable world of creation is available? It’s code as escapist drug, and I love to get high on it. Mundane things like cleaning my house, and more serious ones like taking care of my health, are all easy to avoid while fixing bugs or starting a new project.

It’s both possible and important to find a balance.

The team’s secretary, who was much more than her title suggests, suffered and succeeded with the rest of the team. Even she says:

I would do it again. I would be very grateful to do it again. I think I would take a cut in pay to do it again.

Even as I recover from projects that burned me out, I am constantly thinking about how to do new ones. In fact, while I’m doing any project, I’m already thinking about doing another. This sounds like drugs again. But they are good drugs.

Harassment and Treatment of Women

The book describes how some team members tormented the lone female engineer. This is something that still happens today, and it’s terrible. And people then wonder why there are so few women in our industry.

In addition to that, at the end when they hand out the peer awards, their award to the woman was for putting up with them, not for any of her actual accomplishments.

Betty Shanahan was that lone woman, and it looks to me that she deserved more than just an award for thick skin. She’s the CEO of the Society of Women Engineers, and she was “a member of the design team for the first parallel processing minicomputer and manager of hardware design for subsequent systems.” She later moved to the business side of technology, and I wonder if that had anything to do with her having to put up with the Eagle team’s harassment.

How Something is Done is Important Too

Often we judge things by their properties, but one can also rightly judge something by how it is made. Shoes made from child labor are less good than those made in other ways.

Kidder, the book’s author, discusses this:

In The Nature of the Gothic John Ruskin decries the tendency of the industrial age to fragment work into tasks so trivial that they are fit to be performed ony by the equivalent of slave labor. Writing in the nineteeth century, Ruskin was one of the first, with Marx, to have raised this now-familiar complaint. In the Gothic cathedrals of Europe, Ruskin believed, you can see the glorious fruits of free labor given freely. What is usually meant by the term craftsmanship is the production of things of high quality; Ruskin makes the crucial point that a thing may also be judged according to the conditions under which it was built.

By this kind of measure, is the work many teams do good? Is the Eagle computer that Tom West’s team built really a success since the team worked much overtime, suffered divorces and other problems, and in the end received little to no reward?

I think it’s time for entrepreneurs and workers in our industry to demand better. Our outputs will be better if they are made sustainably, and not just by the measure above. In retrospect, maybe the reviewers of LA Noire should have taken into the account the trials of its developers; it certainly would not have fared well.

Freedom of Expression

I want to hire resourceful people. I want to describe a general outline of a design and not have to describe it in intricate detail in order for them to build it.

It turns out that this is critical for happiness. If we’re told exactly how to do something, it takes much of the creativity and fun out of the work.

Engineers are supposed to stand among the privileged members of industrial enterprises, but several studies suggest that a fairly large percentage of engineers in America are not content with their jobs. Among the reasons cited are the nature of the jobs themselves and the restrictive way sin which they are managed. Among the terms used to describe their malaise are declining technical challenge; misutilization; limited freedom of action; tight control of working conditions.

You must trust those you work with to be resourceful. If you don’t trust them, you will end up micromanaging them into unhappiness, and you will also remove their valuable creative input from your product.

There is a balance to be struck with feedback. The Eagle engineers thought that the managers didn’t appreciate their efforts, but in reality, some of this was them trying to stay out of the way. Kidder asked the Tom West’s boss:

Had the Eagle project always interested him or had it grown in importance gradually?

“From the start it was a very important project.”

Was he pleased with the work of the Eclipse group?

“Absolutely!” His voice falls. “They did a hell of a job.”

But some members of the team felt that they had been rather neglected by the company.

“That doesn’t surprise me,” he says. “That’s frequently the case. There’s often a conflict in people’s minds. How much direction do they want?”

I’ve had this same issue with investors as well. You don’t want them to meddle with your company or your product, but you also want their advice and guidance. It’s possible to go too far in either direction, but mostly you hear about stories where investors meddle too much. I personally think it’s probably better to err on the side of too little help than to end up with too much meddling.

The Venture Capitalists

Even thirty years ago, the VCs had a bad rap. Tom West was asked in a Wired article years after the book’s publishing why he stayed at Data General until he retired:

“You could do new products and companies within the company, rather than shag some venture capitalist and kill yourself for five years.” To be an entrepreneur, he says, “you have to be interested in networking, even with fools.”

This is another reason why I would prefer to bootstrap companies if at all possible.

Tom West ended up working on many interesting projects at Data General, but ultimately, none of them got the support or recognition they deserved. The other members of the Eagle team spread out and started or worked for new companies, and in general seemed much happier.

Final Thoughts

In the end, it’s both a fascinating tale of heroism and creativity and a saddening tale of undervalued and underpaid engineers. I am both emboldened to keep following my passions and more mindful of its dangers. My troubles are not unique - not even modern. Thirty years after this book was written, I feel like it could have been written yesterday.

The Potentially Dark Future of Search

2012-01-12T00:00:00Z

Twitter sees Google’s latest Google+ feature, integration into Google search, as anti-competitive, and it probably is. However, it brings to the surface some real issues with the future of search and of data.

Twitter’s argument:

We’re concerned that as a result of Google’s changes, finding this information will be much harder for everyone. We think that’s bad for people, publishers, news organizations and Twitter users.

Google’s response was:

We are a bit surprised by Twitter’s comments about Search plus Your World, because they chose not to renew their agreement with us last summer (http://goo.gl/chKwi), and since then we have observed their rel=nofollow instructions.

People have been digging into the semantics of nofollow (see Danny Sullivan and Luigi Montanez), but there is a much bigger issue.

Google and other established and up-and-coming search engines have no real way to include lots of data in their index. It’s easy to imagine that the lack of access to Twitter and Facebook data was a motivator for Google+ in the first place.

Lots of sites now generate enough data that it is unrealistic to crawl them. For example, Youtube has more new content every day than they allow anyone to crawl. Twitter is essentially the same. This means there is no way to index this data without special arrangements with the provider. Twitter has closely guarded their firehose of data, but at least they have some mechanism to obtain it. Youtube, as far as I am aware, has no such mechanism.

My team and I ran into this problem head on trying to build Collecta, a real-time search engine. Access to the data was a primary blocker for many features and product ideas, and over the too short life of that company, access became significantly more difficult, not easier.

Google can build an effective search, even a real-time one, for Youtube, but no one else can. Twitter can build search for their data, but few others can, and their data access policies can and do change on a whim.

If Google believes that microblogging data will improve their search product, then a reasonable strategy to obtain that data is to try and build their own microblogging service to generate it. I can’t fault Google for trying. If I thought Collecta could have effectively competed against Twitter for their audience, I would certainly have attempted that as well.

Google, Twitter, Facebook and others are hoarding silos of otherwise public data. Not only is this artificially limiting the features of their products, but it squashes the potential for new and exciting search applications. The search services that have sprung up are limited to your own data, aggregate results from service-specific search APIs, exist at the mercy of data providers, or make do with a tiny subset of the data. I don’t think Google could have built their own search engine if the Web were similarly hostile.

One could argue for requiring these bits of data to be openly available, but unlike the data of the past, this data is expensive to publish and consume. Most of these services may not even have a mechanism to publish the data, even internally. Simply receiving the Youtube or Twitter firehoses (and not counting video or image media) would require significant engineering effort, and the rate of data generation is only accelerating.

I think we must push for open access to data, even if it is costly. These data wars benefit very few. If things don’t change, the future of search is dark.

Strophe.js 1.0.2 Released

2011-06-19T00:00:00Z

I’ve just tagged and released Strophe.js 1.0.2. You can find it on the new Strophe.js site.

Please consider upgrading as soon as possible, as a security problem was found in Strophe.js 1.0.1. The DIGEST-MD5 SASL method used a constant client nonce due to a bug in Strophe’s use of the underlying MD5 library. I don’t know of any exploits for this bug, but it could compromise your site’s security.

Much of the credit for this release goes to the many contributions and pull requests that people have sent in the last year. The community’s effort continues to make Strophe.js better and better.

metajack.im

Servo Talk at LCA 2017

Servo Interview on The Changelog

Building Rust Code - Using Make Part 2

New Rust Compiler Flags

Magical Makefiles Version 2

Building Rust Code - Using Make

The Example Crate

Makefile Abstractions

Magical Makefiles

Behind the Scenes

Next Up

Building Rust Code - Current Issues

Current Issues

Compilation Unit

Output Names

Dependency Information

Making It Better

Stable and Computable Hashes

Dependency Management

Next Up

Seven Web Frameworks in Seven Weeks in Beta

Servo Update: Navigation, Scrolling, GPU Rendering, Underlines, and more

Navigation and Scrolling

Underlined Text

Performance metrics

GPU Rendering

Miscellaneous

Servo Update: Upgrading Rust, GPU Rendering, and Automation

Day One

Upgrading to Rust 0.6

GPU Rendering

Automation

Other Work

Joining Mozilla

Digital Audio and Sampling Explained

XEP Lookups on DuckDuckGo

The Numbers Behind the Twitter Data Silo

The More Things Change: A Review of The Soul of a New Machine

Work is a Drug

Harassment and Treatment of Women

How Something is Done is Important Too

Freedom of Expression

The Venture Capitalists

Final Thoughts

The Potentially Dark Future of Search

Strophe.js 1.0.2 Released