pcwalton

Occasional notes on Rust, Firefox, etc.

Revamped Parallel Layout in Servo

| Comments

Over the past week I’ve submitted a series of pull requests that significantly revamp the way parallel layout works in Servo. Originally I did this work to improve performance, but it’s also turned out to be necessary to implement more advanced CSS 2.1 features. As it’s a fairly novel algorithm (as far as I’m aware) I’d like to take some time to explain it. I’ll start with where we are in Servo head and explain how it evolved into what’s in my branch. This post assumes a little knowledge about how browser engines work, but little else.

Overview

Servo’s layout operates on a flow tree, which is similar to the render tree in WebKit or Blink and the frame tree in Gecko. We call it a flow tree rather than a render tree because it consists of two separate data types: flows, which are organized in a tree, and boxes, which belong to flows and are organized in a flat list. Roughly speaking, a flow is an object that can be laid out in parallel with other flows, while a box is a box that must be laid out sequentially with other boxes in the same flow. If you’re familiar with WebKit, you can think of a box as a RenderObject, and if you’re familiar with Gecko, you can think of a box as a nsFrame. We want to lay out boxes in parallel as much as possible in Servo, so we group boxes into flows that can be laid out in parallel with one another.

Here’s a simple example. Suppose we have the following document:

<html>
<body>
    <p>Four score and seven years ago our <b>fathers</b> brought forth on this
    continent, a new nation, conceived in Liberty, and dedicated to the
    proposition that all men are created equal.</p>
    <p>Now we are engaged in a great civil war, testing whether that nation,
    or any nation so conceived and so dedicated, can long endure. We are met
    on a great battle-field of that war. We have come to <i>dedicate</i> a
    portion of that field, as a final resting place for those who here gave
    their lives that that nation might live. It is altogether fitting and
    proper that we should do this.</p>
</body>
</html>

This would result in the following flow tree:

(Flow tree)

Notice that there are three inline boxes under each InlineFlow. We have multiple boxes for each because each contiguous sequence of text in the same style—known as a text run—needs a box. During layout, the structure of the flow tree remains immutable, but the boxes get cut up into separate lines, so there will probably be many more boxes after layout (depending on the width of the window).

One neat thing about this two-level approach is that boxes end up flattened into a flat list instead of a linked data structure, improving cache locality and memory usage and making style recalculation faster because less needs to be allocated. Another benefit (and in fact the original impetus for this data structure) is that the line breaking code need not traverse trees in order to do its work—it only needs to traverse a single array, making the code simpler and improving cache locality.

Now that we know how the flow tree looks, let’s look at how Servo currently performs layout to figure out where boxes should go.

The current algorithm

The current algorithm for parallel layout in Servo (i.e. what’s in the master branch before my changes) consists of three separate passes over the flow tree.

  1. Intrinsic width calculation or bubble_widths (bottom-up). This computes the minimum width and preferred width for each flow. There are no sequential hazards here and this can always be computed in parallel. Note that this is information is not always needed during layout, and eventually we will probably want to implement optimizations to avoid computation of this information for subtrees in which it is not needed.

  2. Actual width calculation or assign_widths (top-down). This computes the width of each flow, along with horizontal margins and padding values.

  3. Height calculation or assign_heights (bottom-up). This computes the height of each flow. Along the way, line breaking is performed, and floats are laid out. We also compute vertical margins and padding, including margin collapse.

Within each flow, boxes are laid out sequentially—this is necessary because, in normal text, where to place the next line break depends on the lines before it. (However, we may be able to lay boxes out in parallel for white-space: nowrap or white-space: pre.)

For simple documents that consist of blocks and inline flows, Servo achieves excellent parallel wins, in line with Leo Meyerovich’s “Fast and Parallel Webpage Layout”, which implemented this simple model.

The problem with floats

Unfortunately, in the real world there is one significant complication: floats. Here’s an example of a document involving floats that illustrates the problem:

<div style="float: right">
    I shot the sheriff.
    But I swear it was in self-defense.
    I shot the sheriff.
    And they say it is a capital offense.
</div>
<div>
    I shot the sheriff
    But I didn't shoot no deputy.
</div>
<div>
    All around in my home town,
    They're tryin' to track me down;
    They say they want to bring me in guilty
    For the killing of a deputy,
    For the life of a deputy.
</div>

Rendered with a little extra formatting added, it looks like this:

I shot the sheriff. But I swear it was in self-defense. I shot the sheriff. And they say it is a capital offense.
I shot the sheriff But I didn’t shoot no deputy.
All around in my home town, They’re tryin’ to track me down; They say they want to bring me in guilty For the killing of a deputy, For the life of a deputy.

The flow tree for this document might look like this:

(Flow tree)

Notice how the float in green (“I shot the sheriff…”) affects where the line breaks go in the two blocks to its left and below it (“I shot the sheriff…” and “All around…”). Line breaking is performed during the height assignment phase in Servo, because where the line breaks go determines the height of each flow.

This has important implications for the parallel algorithm described above. We don’t know how tall the float is until we’ve laid it out, and its height determines where to place the line breaks in the blocks next to it, so we have to lay out the float before laying out the blocks next to it. This means that we have to lay out the float before laying out any blocks that it’s adjacent to. But, more subtly, floats prevent us from laying out all the blocks that they impact in parallel as well. The reason is that we don’t know how many floats “stick out” of a block until we know its height, and in order to perform line breaking for a block we have to know how many floats “stuck out” of all the blocks before it. Consider the difference between the preceding document and this one:

I shot the sheriff. But I swear it was in self-defense. I shot the sheriff. And they say it is a capital offense.
I shot the sheriff But I didn’t shoot no deputy. I shot the sheriff But I didn’t shoot no deputy.
All around in my home town, They’re tryin’ to track me down; They say they want to bring me in guilty For the killing of a deputy, For the life of a deputy.

The only difference between the first document and the this one is that the first unfloated block (“I shot the sheriff…”) is taller. But this impacts the height of the second block (“All around…”), by affecting where the lines get broken. So the key thing to note here is that, in general, floats force us to sequentialize the processing of the blocks next to them.

The way this was implemented in Servo before my pull requests is that any floats in the document caused all unfloated blocks to be laid out sequentially. (The floats themselves could still be laid out in parallel, but all other blocks in the page were laid out in order.) Unsurprisingly, this caused our parallel gains to evaporate on most real-world Web pages. The vast majority of modern Web pages do use floats in some capacity, as they’re one of the most popular ways to create typical layouts. So losing our parallel gains is quite unfortunate.

Can we do better? It turns out we can.

Clearing floats

As most designers know, the float property comes with a very useful companion property: clear. The clear property causes blocks to be shifted down in order to avoid impacting floats in one or both directions. For example, the document above with clear: right added to the second block looks like this:

I shot the sheriff. But I swear it was in self-defense. I shot the sheriff. And they say it is a capital offense.
I shot the sheriff But I didn’t shoot no deputy.
All around in my home town, They’re tryin’ to track me down; They say they want to bring me in guilty For the killing of a deputy, For the life of a deputy.

This property is widely used on the Web to control where floats can appear, and we can take advantage of this to gain back parallelism. If we know that no floats can impact a block due to the use of the clear property, then we can lay it out in parallel with the blocks before it. In the document above, the second block (“All around…”) can be laid out at the same time as the float and the first block.

My second pull request implements this optimization in this way: During flow construction, which is a bottom-up traversal, we keep track of a flag, has_floated_descendants, and set it on each flow if it or any of its descendants are FloatFlow instances. (Actually, there are two such flags—has_left_floated_descendants and has_right_floated_descendants—but for the purposes of this explanation I’ll just treat it as one flag.) During width computation, we iterate over our children and set two flags: impacted_by_floats. (Again, there are actually two such flags—impacted_by_left_floats and impacted_by_right_floats.) impacted_by_floats is true for a flow if and only if any of the following is true:

  1. The parent flow is impacted by floats.

  2. The flow has floated descendants.

  3. Any previous sibling flow is impacted by floats, unless the appropriate clear property has been set between this flow and that sibling.

Only subtrees that have impacted_by_floats set to true are laid out sequentially, in order. The remaining subtrees can be laid out in parallel.

With this optimization implemented, documents above can be laid out in parallel as much as possible. It helps many real-world Web pages, as clear is a very commonly-used property.

At this point, two questions arise: “Can we do even more?” and “Is this algorithm enough to properly handle CSS?” As you might expect, the answer to the first is “yes”, and the answer to the second is “no”. To understand why, we need dive into the world of block formatting contexts.

Block formatting contexts

The behavior of overflow: hidden is subtle. Consider this document, which is identical to the document we’ve been using but with overflow: hidden specified on the blocks adjacent to the float:

<div style="float: right">
    I shot the sheriff.
    But I swear it was in self-defense.
    I shot the sheriff.
    And they say it is a capital offense.
</div>
<div style="overflow: hidden">
    I shot the sheriff
    But I didn't shoot no deputy.
</div>
<div style="overflow: hidden">
    All around in my home town,
    They're tryin' to track me down;
    They say they want to bring me in guilty
    For the killing of a deputy,
    For the life of a deputy.
</div>
</div>

Rendered, it looks like this:

I shot the sheriff. But I swear it was in self-defense. I shot the sheriff. And they say it is a capital offense.
I shot the sheriff But I didn’t shoot no deputy.
All around in my home town, They’re tryin’ to track me down; They say they want to bring me in guilty For the killing of a deputy, For the life of a deputy.

Notice that, with overflow: hidden specified, the float makes the entire width of the block next to it smaller: all the lines have been wrapped, not just those that impact the float.

What’s going on here is that overflow: hidden establishes what’s known as a block formatting context in CSS jargon. In Servo, block formatting contexts make our layout algorithm significantly more complex, because they require width assignment and height assignment to be intertwined, and for height assignment to be interruptible. To see why this is, recall that the flow tree for this document looks like this:

(Flow tree)

Remember that Servo’s layout algorithm performs width calculation top-down, then height calculation bottom-up—this works under the assumption that widths never depend on heights. But with block formatting contexts adjacent to floats, this is no longer true: the width of a block formatting context depends on the height of floats next to it. This is because we don’t know whether a float, such as the green float above, is tall enough to impact a block formatting context, like those that the “I shot the sheriff…” and “All around…” above establish, until we lay out all blocks prior to the context and the float itself. And without knowing that, we cannot assign the width of the block formatting contexts.

To handle this case, my patches change Servo’s layout in several ways:

  1. When we see a block formatting context during width calculation, we check the value of the impacted_by_floats flag. If it is on, then we don’t calculate widths for that flow or any of its descendants. Instead, we set a flag called width_assignment_delayed.

  2. When we encounter a block formatting context child of a flow while calculating heights, if that block formatting context has the flag width_assignment_delayed set, we suspend the calculation of heights for that node, calculate the width of the block formatting context, and begin calculating widths and heights for that node and all of its descendants (in parallel, if possible).

  3. After calculating the height of a block formatting context, we resume calculation of heights for its parent.

Let’s look at the precise series of steps that we’ll follow for the document above:

  1. Calculate the width of the root flow.

  2. Calculate the width of the float flow.

  3. Don’t calculate the widths of the two block flows; instead, set the width_assignment_delayed flag.

  4. Calculate the width of the float flow’s inline flow child. The main width assignment phase is now complete.

  5. Begin height calculation. First, calculate the height of the float flow and its inline child.

  6. Start calculating the height of the root flow by placing the float.

  7. We see that we’ve hit a block formatting context that has its width assignment delayed, so we clear that flag, determine its width, and start width calculation for its descendants.

  8. Calculate width for the block flow’s inline child. Now width calculation is done for that subtree.

  9. Calculate the height of the block flow’s inline child, and the block flow itself. Now height calculation is done for this subtree.

  10. Resume calculating the height of the root flow. We see that the next block formatting context has its width assignment delayed, so we assign its width and repeat steps 8 and 9.

  11. We’ve now calculated the height of the root flow, so we’re done.

Now this particular page didn’t result in any parallel speedups. However, block formatting contexts can result in additional parallelism in some cases. For example, consider this document:

<div id=sidebar style="float: left">
    <div>Coupons</div>
    <div>Freebies</div>
    <div>Great Deals</div>
</div>
<div id=main style="overflow: hidden">
    <div>Deals in your area:</div>
    <ul>
    <li>Buy 1 lawyer, get 1 free</li>
    <li>Free dental fillings</li>
    </ul>
</div>

Rendered, it looks like this:

Deals in your area:
  • Buy 1 lawyer, get 1 free
  • Free dental fillings

In this document, after we’ve laid out the sidebar, we can continue on and lay out the main part of the page entirely in parallel. We can lay out the block “Deals in your area” in parallel with the two list items “Buy 1…” and “Free dental fillings”. It turns out that this pattern is an extremely common way to create sidebars in real Web pages, so the ability to lay out the insides of block formatting contexts in parallel is a crucial optimization in practice. The upshot of all this is that block formatting contexts are a double-edged sword: they add an unfortunate dependency between heights and widths, but they enable us to recover parallelism even when blocks are impacted by floats, since we can lay out their interior in parallel.

Conclusion

No doubt about it, CSS 2.1 is tricky—floats perhaps more than anything else. But in spite of their difficulties, we’re finding that there are unexpected places where we can take advantage of parallelism to make a faster engine. I’m cautiously optimistic that Servo’s approaching the right design here—not only to make new content faster but also to accelerate the old.

Removing Garbage Collection From the Rust Language

| Comments

I’ve been floating ways to simplify the memory management story in Rust around the core team lately. Memory management is a contentious topic, since we’ve worked hard to get to the current state of things, and with the push toward stability lately, there is a (quite reasonable!) resistance to any changes at this state. Still, I think the current memory management story in Rust is worth revisiting, as the current state of things may cause us problems down the line. Working with Dave Herman and Niko Matsakis, I’ve formulated a fairly concrete proposal at this point. The basic idea is to remove garbage collection from the core language and relegate it to the standard library, with a minimal set of language hooks in place to allow for flexible, pluggable automatic memory management.

This post is designed to explain the “why”, not the “how”—I’m leaving the concrete details of the proposed system to a future blog post or mailing list discussion. Rather, this explains the issue that I see with the current system. I think that the garbage collection story as it stands in Rust is not quite ideal, for three reasons: familiarity, simplicity, and flexibility. I’ll cover each in turn.

Familiarity

One of the most common questions almost every Rust beginner asks is “when do I use managed pointers, and when do I use owned pointers?” Or, more simply, “what are all these ~ and @ symbols everywhere?” Having worked on Rust for many years now, I’ve seen several reasons for the difficulty. Chief among them are:

  1. The difference between the stack and the heap is a difficult concept to grasp for many programmers used to languages like Java that don’t make such a distinction. This is, unfortunately, a fundamental difficulty of working in a systems language. There’s little that can be done about this without taking control of allocation out of the hands of the programmer. Doing that, however, would compromise the goals of the language—in low-level, performance-critical programming, being able to precisely control whether allocations occur on the stack or on the heap is crucial.

  2. The sigils make the code unfamiliar before the concepts are learned. Unlike the rest of the punctuation in Rust, ~ and @ are not part of the standard repertoire of punctuation in C-like languages, and as a result the language can seem intimidating. One of the benefits of keywords is that they are self-documenting in a way that punctuation is not. This could be fixed by switching to keywords, which I prefer for this reason; however, syntactic beauty is in the eye of the beholder and so I won’t lose sleep over this not changing if the community prefers the current syntax.

  3. There are two heaps, not just one, so beginners are confused as to which one to allocate into. This is a result of the “minimize sharing by default” philosophy of the concurrency system. However, the concurrency system has been part of the library rather than the language for several years now, so this seems somewhat out of place.

  4. Programmers don’t know which to use, since some operations are available with ~ and some operations are available with @. Actually, we were confused on this point for a long time as well—it wasn’t clear whether ~ or @ would become dominant. We debated for a long time which to present first, ~ or @. However, as the language and community evolved, and coding standards became more settled, a clear winner emerged: the owning pointer ~. In practice, the rule has been that programmers should use ~ to allocate in all circumstances except when they have no way of knowing precisely when the object in question should be freed.

Point (4), to me, is the most critical. The rule that emerged—~ over @—should not be surprising, in retrospect, as it is how systems software has been developed for decades. The key insight that was missing is that the owning pointer ~ is just the Rust equivalent of malloc and free. For many, probably most C programs, malloc and free are just fine (assuming you use them correctly, of course); each heap allocation is allocated in just one place and destroyed in just one place. Only when the lifetimes of objects become very complex do C and C++ programmers resort to manual reference counting to determine when an object should be freed (and many, perhaps most, C programs never get there). This is the role that has emerged for @ in Rust programs: @ is a replacement for manual reference counting in C programs. The kobject system in the Linux kernel, the GObject system in glib, and so forth, are the C equivalents of @ in Rust.

The key point here is that these are very specialized use cases in C, and @ has been relegated to a similarly marginal role in idiomatic Rust code. We thought for a while that many Rust programs would use @ extensively and that it would ease the learning curve for those not used to destructor-based memory management and references. This has not, however, been the case in practice. In reality, since the libraries all use owning pointers (~), Rust programmers have to learn them quickly anyhow. And once Rust programmers learn how to use ~ effectively, they quickly find @ relegated to a marginal role, if it’s used at all. ~ has so many advantages: deterministic allocation and destruction, interaction with the standard library, freedom from GC marking pauses, simpler semantics, appendability where vectors and strings are concerned, and sendability across tasks.

I think we’re better off teaching ~ as the go-to solution for most programs and relegating @ to a specialized role. @ has its use cases, to be sure; large, event-driven C++ programs use reference counting for a reason. But those use cases are specialized. Beginners should not be asking “should I use ~ or @?” The answer is almost always ~.

In this regard relegating @ to a library is just the natural conclusion of this approach. I feel that what beginners should be taught is that ~ is the way to allocate in Rust, and letting an ~ owning pointer go out of scope is the way you free in Rust. This is what we should be teaching in the language tutorial. As beginners become more comfortable with this and explore the libraries, they will learn about ways to achieve more dynamic memory management: tracing garbage collection with the Gc type, reference counting with the Rc type, and thread-safe reference counting with the Arc type. But by building only ~ into the language, we can reduce confusion by, in effect, making the language more opinionated.

Simplicity

Although Rust didn’t start out that way, one of the most interesting applications of Rust has been very low-level programming, even down to the level of kernels. The interest in this application of Rust was something of a surprise to us, but in hindsight it makes perfect sense. Low-level control over memory management isn’t something that most applications software, especially on the server side, wants; most of that software has migrated over to languages like Java, Ruby, and JavaScript that trade control and performance for convenience by making memory management automatically, and dynamically, managed by the runtime. The remaining class of software, most of which is written in C and C++, is software that must manage memory manually in order to achieve some combination of performance, simplicity, and/or the ability to self-host. The prospect of using a new language for this class of software, which includes OS kernels, game engines, and browser engines among others, is what is fueling the growth of the nascent Rust community.

It might be possible to create a language that presents only a simple, fully automatic memory management system at first, and which surfaces the machinery of safe manual memory management* only when the programmer requires it for maximum performance. This would ease the learning curve, as programmers would be able to write many, perhaps most programs without ever learning how to manage memory at all. However, at this point I don’t think that this language exists yet, and in particular I don’t think Rust is that language. There are basically two problems here: (1) ~ owning pointers are everywhere in Rust, from the standard library to the built-in macros, making learning about them a necessity from the get-go; and (2) it is basically impossible to program Rust without at least a cursory understanding of references (a.k.a. & pointers) and their lifetime semantics; even vec::each() uses references.

Despite the fact that this might seem like a negative result, I actually think it’s quite positive for the project. It helps to define the project’s scope. I don’t think automatic memory management in Rust is ever going to be as convenient as memory management in, say, Ruby or Java, and that’s OK! The same level of control that adds cognitive overhead to memory management in Rust compared to other languages also makes Rust able to go where few other industry languages have. This space, I think, is where Rust can really shine.

In short, I think that Rust as a language should focus on roughly the same application domain as C++ does.†

Important to this effort is to have as small of a runtime as possible, just as C++ does, leaving higher-level abstractions to libraries. And, in fact, we are almost there already. The only runtime support that compiled Rust programs require are a small set of “language items”, which are magic functions or traits written in Rust that are known to the compiler. Looking at the set of language items, and disqualifying legacy items that will be removed soon such as annihilate and log_type, there are just a few categories:

  1. Operator traits, like Add and Sub. These are analogous to operator+, operator-, and so forth in C++.

  2. Memory primitives, like str_eq. These are somewhat legacy at this point and probably could be converted to LLVM intrinsics like memcmp without much trouble, especially after dynamically sized types happens. In any case, in most C++ compilers memcmp and friends are builtins.

  3. Failure: fail and fail_bounds_check. This is analogous to throw in C++, although a Rust program that doesn’t want to use stack unwinding might want to use abort instead (which would be like -fno-exceptions) or do something more elaborate like the Linux kernel’s “oops” functionality.

  4. Allocation primitives malloc and free. These have direct C++ equivalents: operator new and operator delete.

  5. Garbage collection primitives.

Of these, the only language items that don’t have direct C++ equivalents are the garbage collection primitives. If those were eliminated, then Rust as a language would be every bit as freestanding as C++ is. In terms of suitability for kernel and embedded development, Rust would be on truly equal footing.

In summary: (1) all Rust programmers have to know how ~ and & work, despite the presence of @; (2) the only additional runtime primitives that Rust exposes and C++ doesn’t are those related to @.

Flexibility

When it comes to memory management, there are obviously many different strategies: stack allocation, heap allocation with malloc and free, arena allocation, and garbage collection. What’s less well known is that even among garbage collection, there are many different approaches, each with advantages and disadvantages. There’s thread-local GC, thread-safe GC, incremental GC, generational GC, reference counting, thread-safe reference counting, deferred reference counting, ulterior reference counting—the list goes on and on. (For a good survey of automatic memory management techniques and how they relate to one another, check out “A Unified Theory of Garbage Collection” by Bacon et al.) A program that wants to maximize performance among some axis (latency versus throughput) and remain safe with objects with complex lifetimes may have reasons to choose one or the other.

Specifically, there’s the perennial debate between reference counting and tracing garbage collection. Many applications are better with tracing GC because of the increased throughput it provides and straightforward handling of cycles, and many applications are better with reference counting because of implementation simplicity, cache behavior, mostly-incremental operation, and promptness of deallocation. It makes sense for applications to be able to choose between the two. Even more important is the tradeoff between thread-safe and thread-local garbage collection: concurrent garbage collection is practically always more expensive than thread-local garbage collection, so it makes sense for programs to restrict concurrent GC (including atomic reference counting) to be used only when needed.

Integrating multiple tracing garbage collectors or cycle collectors into one system is a hard problem, and I don’t think Rust is going to really solve it. However, integrating reference counting into a garbage collected system is straightforward, as long as cycles are not created (and in Rust we can forbid the creation of such cycles through clever use of the type system). In practice this seems to work well: we typically use thread-local tracing GC for data with complex lifetimes within one task, and we use thread-safe reference counting for data that must be shared between tasks.

Equally important is the ability to integrate with external garbage collection systems (usually reference counted ones). This is a problem that is often overlooked, but is terribly important for client software such as mobile apps and browser engines. On Windows, apps must integrate with the reference-counted COM system in order to use DirectX and other APIs. On the Mac and on iOS, apps have to integrate with Objective-C and the closely-related Core Foundation, also reference-counted systems. On Linux, GNOME apps have to integrate with GObject, again a reference-counted system. On Android, apps have to integrate with the garbage-collected Dalvik subsystem via the JNI. All of this requires that the memory management system in the language be deeply flexible.

Because of this, I’m suspect of blessing any particular form of automatic memory management in the core language. In Rust, the @ type is not only blessed with special syntax, but is eligible for borrowing and other operations in a way that user-defined types aren’t. Although Rust provides the facilities needed to build practically all the other forms of garbage collection, as well as those needed to integrate with external GC systems in a safe way, the resulting smart pointers feel second-class compared to @. A systems language designed to work in a diverse set of environments should have the flexibility to create memory management abstractions that feel first-class.

Conclusion

For these three reasons—familiarity, simplicity, and flexibility—I’d like to propose removing @ pointers from the language and replacing them with a small set of hooks allowing the same functionality to be implemented as a library and on user-defined types. We would ship tracing GC as part of the standard library and make it just as powerful and convenient as it is today (except for the @ syntax). We’d gain a flexible set of abstractions, make the language easier to learn, and make Rust into a truly freestanding language environment.

* Note that the safe qualifier here disqualifies manually-built free lists in garbage-collected languages, as these manually-built free lists provide no protection against errors like double “frees”, leaks, and danging pointers. (They’re significantly worse than true manual memory management anyhow; the GC still has to trace through objects in arenas at mark time, copy the objects within out into the tenured generation when they survive a minor collection, write barrier the objects, and so forth.)

† Note that I don’t mean you shouldn’t write Web frameworks and Web sites in Rust: in fact, I think Rust would be a fantastic language for many classes of Web server software, especially that which must scale to the highest loads and squeeze every ounce of performance out of the servers on the racks.

Safe Manual Memory Management

| Comments

If there’s one feature of Rust that is probably the most unique among languages in industry, it’s safe manual memory management.

It’s easiest to explain safe manual memory management by explaining how it differs from the memory models of other languages. There are a few different models in common use in industry languages:

  • Unsafe manual memory management—These languages provide very fine-grained control over memory allocation; heap memory can be explicitly allocated and deallocated. The most important examples here are C and C++. The well-known downside of this approach is that memory safety violations can only be detected at runtime with a memory safety checker such as Valgrind or Address Sanitizer. Memory safety violations that go unchecked can lead to crashes at best and exploitable security vulnerabilities at worst.

  • Full garbage collection—The majority of modern languages expose a memory model that falls into this category—the space is very diverse, ranging from Java to Go to JavaScript to Ruby to Haskell. In general, such languages place all allocations into the heap instead of the stack, although escape analysis and value types may be used to reduce the number of heap allocations. Periodically, a garbage collector scans all pointers on the stack and in the heap, judges unreachable objects dead, and reclaims them. This approach has the advantage of memory safety at compile time—the language arranges for there to be no dangling pointers, wild pointers, and so forth. The downsides, however, are:

    1. The garbage collector may run at an inconvenient time. This can be mitigated by explicit control over when the GC runs, although if the garbage collector must collect multiple threads’ heaps at the same time, this may be difficult to synchronize. This can also be mitigated by using manual memory pooling and free lists, although pooling has undesirable safety properties—much like unsafe manual memory management, there is no static guarantee that objects allocated from a pool are returned properly or that an object is not reachable when returned to the pool. Incremental and concurrent garbage collectors help here, but they are not free, as they typically require write and/or read barriers, reducing throughput.

    2. When it runs, the garbage collector must mark all pointers to discover which ones are live, reducing throughput of the application. Essentially, the GC must discover at runtime what a C++ (say) programmer knows at compile time. Not much can typically be done about this cost in fully garbage-collected languages, short of falling back to unsafe manual memory management. Pools don’t help much here, because the GC must still trace the pointers into the pool. Even pointers into the stack generally must be traced.

  • Garbage collection with value types and references—This category includes languages like C#. (I believe D falls into this category as well, although I may be mistaken.) These languages are essentially garbage-collected, but they include value types which are guaranteed to be stack-allocated if in local variables. Additionally, and most importantly, they include reference parameters (and sometimes reference locals), which allow stack-allocated values to be temporarily aliased when calling another function. Effective use of value types can reduce marking and sweeping time. In general, this system is an effective addition to a garbage-collected system, allowing a good measure of additional control without much cost in complexity and no cost in memory safety. It is not, however, typically sufficient to write programs without using the garbage collector at all; the system is too simple to statically encode anything other than the most basic memory management patterns.

Where does Rust fit in? Actually, it fits into a category all to itself among industry languages (although one shared by various research languages, like Cyclone). Rust offers safe manual memory management (although some have objected to the term “manual” here). It extends the system described above as “garbage collection with value types and references” in two important ways:

  1. You can allocate memory that will not be traced by the garbage collector, and free it manually if you choose. This is the feature of Rust known as “unique pointers”. Rust will automatically free memory that is uniquely owned when its owning pointer goes out of scope. It’s also easy to write a function that acts exactly as free does, so you can precisely choose when your objects die. Unique pointers are not traced by the GC (unless they point to a type that transitively contains a garbage-collected pointer), so they are an effective way to cut down marking times.

  2. You can return references and place references into data structures. Like other references, these references are not traced by the garbage collector. As long as the references follow a stack discipline, meaning that they point to memory that was allocated by one of the callers of the current function, the compiler allows them to be placed anywhere. This adds a great deal of expressiveness over the reference parameter approach, and it enables a large number of programs to be written without using the garbage collector at all.

In terms of safety and performance, safe manual memory management is having your cake and eating it too. You get memory safety like a garbage-collected language, but control like that of unsafe manual memory management. But this system has its downsides as well—most importantly, complexity of implementation and interface. Learning to use references and unique pointers poses a significant learning curve. But, once the system is learned, it’s remarkably flexible, with an attractive combination of performance and safety.

Performance of Sequential Rust Programs

| Comments

Although Rust is designed for parallel programs, it is important that the performance of single-threaded, sequential programs not suffer in its design. As far as Servo is concerned, sequential performance is still important in many domains that a Web browser engine must compete in.

Below are some selected single-threaded benchmarks from the Computer Language Benchmarks Game (formerly, and still informally, called the “shootout”). This is far from an ideal set. These benchmarks are showing their age quite heavily, they are too small and simplistic to extrapolate to real-world use cases, and many of them are too I/O-bound.

It is perfectly legal per the rules of the benchmarks game to use unsafe code (or calling libraries written in C, which is equivalent), and I believe it’s very difficult to precisely match C’s performance without resorting to unsafe code. (Practically speaking, one would need an extremely smart JIT, or a research language with a complex dependent type system.) As my colleague Niko pointed out, a more interesting benchmark would not allow any languages to use unsafe code and would exclude C and C++ from competing at all, except as a point of comparison—such a benchmark would be interesting to determine how much performance one has to trade for type safety in mainstream languages. But the shootout is what it is, and so the Rust versions of these benchmarks heavily use unsafe code. Over time, I hope to be able to reduce the amount of unsafe code present in the Rust versions of these benchmarks, but a couple of benchmarks will likely always remain unsafe.

Neither the C nor the Rust versions of these benchmarks use SIMD or threads. This is by design, as the goal of this test is to measure Rust’s sequential performance. Over time, as Rust gains SIMD support and the scheduler improves (both of which are active areas of development), the benchmarks will be updated to use multiple threads. But keep in mind that the C implementation tested against is not usually the top one on the shootout site; rather, I selected the fastest implementation that did not use SIMD or threads for comparison. As the Rust benchmarks are updated to use SIMD and threads, equivalent C versions will be used for comparison.

For all these reasons and more, it is important to not read too much into these benchmark results. It would be a mistake to conclude that “Rust is faster than C” because of the performance on the k-nucleotide benchmark. Likewise, it would be a mistake to conclude that “C is faster than Rust” because of the fasta-redux benchmark. The goal here is simply to demonstrate that sequential Rust can be written in a way that approaches competitive parity with equivalent C code.

Note that the benchmarks include clang because GCC 4.2 is a very old version. The purpose of this benchmark is not to benchmark C compilers, but rather to perform cross-implementation comparisons between two languages.

Enough disclaimers; on to the results:

Results

These programs were tested on a 2.53 GHz Intel Core 2 Duo MacBook Pro with 4 GB of RAM, running Mac OS X 10.6 Snow Leopard. “GCC 4.2” is GCC 4.2.1, Apple build 5666; “clang 1.7” is Apple clang 1.7, based on LLVM 2.9svn; “clang 3.1” is LLVM 3.1, trunk 149587. GCC and clang were run with -O2, and Rust was run with -O (which is like -O2). Three runs were averaged together to produce each result. Results are normalized to GCC 4.2. Lower numbers are better.

As mentioned before, this is a selected set of benchmarks. The benchmarks that were not included are:

  • fasta is omitted because it is similar to fasta-redux.

  • regexp-dna is omitted because it consists of an uninteresting binding to PCRE.

  • binary-trees is omitted because it is a garbage collection benchmark and the C version uses an arena, defeating the purpose (although I suspect a Rust version that did the same would do well).

  • chameneos-redux and threadring are omitted because they are threading benchmarks.

You can see the changes to the Rust compiler that were made to optimize these tests, as well as the benchmark sources, on my branch of the compiler on GitHub. The goal will be to land these changes over the next few days.

A Hard Case for Memory Safety

| Comments

Quick quiz: In this C++ program, is the definition of munge guaranteed to be memory safe? (Assume that the definition of increment_counter uses only modern C++ idioms and doesn’t do anything like dereference an invalid pointer.)

#include <iostream>
#include <vector>

class foo {
public:
    std::vector<int> indices;
    int counter;

    foo() : indices(), counter(0) {
        indices.push_back(1);
        indices.push_back(2);
        indices.push_back(3);
    }

    void increment_counter();

    int &get_first_index() {
        assert(indices.size() > 0);
        return indices[0];
    }

    void munge() {
        int &first = get_first_index();
        increment_counter();
        std::cout << first << std::endl;
        first = 20;
    }
};

int main() {
    foo foo;
    foo.munge();
    return 0;
}

The answer: Even with this caveat, we can’t tell! It depends on the definition of increment_counter.

If increment_counter has this definition, the code is memory safe:

void foo::increment_counter() {
    counter++;
}

But if increment_counter has this definition, for example, then it isn’t:

void foo::increment_counter() {
    indices.clear();
    counter++;
}

This definition would cause the first reference in munge to become a dangling reference, and the call to std::cout and subsequent assignment of first will have undefined behavior. If first were not an int but were instead an instance of a class, and munge attempted to perform a virtual method call on it, then this would constitute a critical security vulnerability.

The point here is that determining memory safety in C++ requires non-local reasoning. Any analysis that tries to determine safety of C++ code, whether performed by a machine or performed by a human auditor, has to analyze many functions all at once, rather than one function at a time, to determine whether the code is memory safe. As this example illustrates, sticking to modern C++ coding styles, even with bounds checks, is not enough to prevent this.

There are a few ways around this:

  • For each function call, analyze the source to the called function to determine whether it’s memory safe in the context of the caller. This doesn’t always work, though: it’s hard or impossible when function pointers or virtual methods are involved (which function ends up being called?), and it’s hard with separately compiled code (what if the called function is in a DLL that you don’t have source for?)

  • Change the type of indices to std::vector<std::shared_ptr<int>>; i.e. use reference counting to keep the pointer alive. This has a runtime cost.

  • Inline the body of increment_counter, so that the memory safety of munge is immediately clear.

  • Make increment_counter a class method (or just a function) instead of an instance method, and have it take counter by reference. The idea here is to prevent the possibility that increment_counter could mess with indices in any way by shutting off its access to it.

What does this have to do with Rust? In fact, this error corresponds to a borrow check error that Brian Anderson hit when working on the scheduler. In Rust, the corresponding code looks something like this:

impl Foo {
    fn get_first_index(&'a mut self) -> &'a mut int {
        assert!(self.indices.len() > 0);
        return &mut indices[0];
    }

    fn munge(&mut self) {
        let first = self.get_first_index();
        self.increment_counter(); // ERROR
        println(first.to_str());
        *first = 20;
    }
}

This causes a borrow check error because the first reference conflicts with the call to increment_counter. The reason the borrow check complains is that the borrow check only checks one function at a time, and it could tell (quite rightly!) that the call to increment_counter might be unsafe. The solution is to make increment_counter a static method that only has access to counter; i.e. to rewrite the self.increment_counter() line as follows:

Foo::increment_counter(&mut self.counter);

Since the borrow check now sees that increment_counter couldn’t possibly destroy the first reference, it now accepts the code.

Fortunately, such borrow check errors are not as common anymore, with the new simpler borrow check rules. But it’s interesting to see that, when they do come up, they’re warning about real problems that affect any language with manual memory management. In the C++ code above, most programmers probably wouldn’t notice the fact that the memory safety of munge depends on the definition of increment_counter. The challenge in Rust, then, will be to make the error messages comprehensible enough to allow programmers to understand what the borrow checker is warning about and how to fix any problems that arise.

An Overview of Memory Management in Rust

| Comments

One of the key features of Rust that sets it apart from other new languages is that its memory management is manual—the programmer has explicit control over where and how memory is allocated and deallocated. In this regard, Rust is much more like C++ than like Java, Python, or Go, to name a few. This is an important design decision that makes Rust able to function in performance-critical domains that safe languages previously haven’t been able to—top-of-the line games and Web browsers, for example—but it adds a nontrivial learning curve to the language.

For programmers familiar with modern C++, this learning curve is much shallower, but for those who are used to other languages, Rust’s smart pointers can seem confusing and complex. In keeping with the systems-oriented nature of Rust, this post is designed to explain how Rust’s memory management works and how to effectively use it.

Smart pointers

In many languages with manual memory management, like C, you directly allocate and free memory with calls to special functions. For example:

void f() {
    int *x = malloc(sizeof(int));  /* allocates space for an int on the heap */
    *x = 1024;                     /* initialize the value */
    printf("%d\n", *x);            /* print it on the screen */
    free(x);                       /* free the memory, returning it to the heap */
}

C gives you a great deal of control over where memory is allocated and deallocated. Memory is allocated with a special function malloc, and it is freed with a special function free. After the call to free, it is an error to attempt to use x, as it is a dangling pointer. A dangling pointer points to invalid memory, but the C compiler makes no attempt to prevent you from using it; it’s your responsibility to avoid touching it after freeing the memory it points to.

Rust gives you the same level of control over memory, but it works somewhat differently. Let’s see how the same piece of code looks in Rust:

fn f() {
    let x: ~int = ~1024;          // allocate space and initialize an int
                                  // on the heap
    println(fmt!("%d", *x));      // print it on the screen
} // <-- the memory that x pointed at is automatically freed here

There are three main differences to notice here:

  1. In C, you allocate memory first (with the call to malloc), and then you initialize it (in the example above, with the *x = 1024 assignment). Rust fuses the two operations together into the ~ allocation operator, so that you don’t accidentally forget to initialize memory before you use it.

  2. In C, the call to malloc returns a plain pointer, int *. In Rust, the ~ operator, which allocates memory, returns a special smart pointer to an int. Because this type of smart pointer is so common, its name is just a single character, ~—thus the type of this smart pointer is written as ~int.

  3. You don’t call free manually in Rust. Rather, the compiler automatically frees the memory for you when a smart pointer goes out of scope.

As it turns out, points (2) and (3) are very intertwined, and together they form the cornerstone of Rust’s memory management system. Here’s the idea: Unlike C, allocation functions in Rust don’t return a raw pointer to the space they allocate. Instead, they return a smart pointer to the space. A smart pointer is a special kind of value that controls when the object is freed. Like a raw pointer in C, you can access the data that a smart pointer refers to with *. But unlike a raw pointer, when the smart pointer to an allocation goes out of scope, that allocation is automatically freed. In this way, smart pointers are “smart” because they not only track where an object is but also track how to clean it up.

Unlike C, in Rust you never call free directly. Instead, you rely on smart pointers to free all allocations. The most basic reason for this is that smart pointers make it harder to forget to free memory. In C, if you forget to call free, you have a memory leak, which means that the memory will not be cleaned up until the program exits. However, in Rust, the compiler will automatically insert the code necessary to free the memory for you when the smart pointer pointing to your data goes out of scope.

Rust has multiple types of smart pointers, corresponding to the different strategies that programs use to reclaim memory. Some smart pointers, namely ~ and @ (which we will cover shortly), have special names known to the compiler, because they’re so common. (Having to type long names like unique_ptr all the time would be a burden.) Other smart pointers, such as ARC (which allows you to share read-only data between threads), are in the standard library and are not built into the compiler.

The pointer covered above is known as the unique smart pointer ~. We call it “unique” because there is always only one smart pointer pointing to each allocation. The other type of smart pointer built into the language is the managed smart pointer, which allows multiple smart pointers to point to the same allocation and uses garbage collection to determine when to free it. Here’s an example of a managed smart pointer in use:

fn foo() {
    let x: @int = @1024;     // allocate space and initialize an int
                             // on the heap
    bar(x);                  // pass it to `bar`
    println(fmt!("%d", *x)); // print it on the screen
} // <-- the memory can be freed here

fn bar(x: @int) {
    let y: @int = x;         // make a new smart pointer to `x`
} // <-- despite `y` going out of scope, the memory is *not* freed here

The key difference between ~ and @ is that @ allows multiple smart pointers to point to the same data, and the data is cleaned up only after the last such smart pointer disappears. Notice that, in this example, the memory pointed at by y (which is the same as the memory pointed at by x) is not freed at the end of the function bar, because x is still in use and also points to the same data. The fact that @ allows multiple smart pointers to the same data, as well as the fact that the allocation is freed only when all of those pointers go out of scope, make managed smart pointers very useful. However, they can be less efficient than unique smart pointers, as they require garbage collection at runtime.

References

Recall that a smart pointer is a pointer that automatically frees the memory that it points to when it goes out of scope. Perhaps surprisingly, it often turns out that it’s useful to have a kind of pointer that doesn’t free the memory that it points to. Consider this code:

struct Dog {
    name: ~str    // a unique smart pointer to a string
}

fn dogshow() {
    let dogs: [~Dog * 3] = [        // create an array of Dog objects
        ~Dog { name: ~"Spot"   },   // use unique smart pointers to
                                    // allocate
        ~Dog { name: ~"Fido"   },
        ~Dog { name: ~"Snoopy" },
    ];
    for dogs.each |dog| {
        println(fmt!("Say hello to %s", dog.name));
    }
} // <-- all dogs destroyed here

Suppose that we wanted to single Fido out as the winner of the dog show. We might try this code:

fn dogshow() {
    let dogs: [~Dog * 3] = [
        ~Dog { name: ~"Spot"   },
        ~Dog { name: ~"Fido"   },
        ~Dog { name: ~"Snoopy" },
    ];
    let winner: ~Dog = dogs[1];
    for dogs.each |dog| {
        println(fmt!("Say hello to %s", dog.name));
    }
    println(fmt!("And the winner is: %s!", winner.name));
} // <-- all dogs, and `winner`, destroyed here

But this code won’t compile. The reason is that, if it did, Fido would be destroyed twice. Remember that unique smart pointers free the allocations they point to when they go out of scope. The code attempts to make a second smart pointer to Fido at the time it executes the line let winner: ~Dog = dogs[1]; If the compiler allowed this to proceed, then at the end of the block, the program would attempt to free Fido twice—once when it frees the original smart pointer embedded within the dogs array, and once when it frees winner.

What we really want is for winner to be a pointer that doesn’t free the allocation that it points to. In fact, what we want isn’t a smart pointer at all; we want a reference. Here’s the code rewritten to use one:

fn dogshow() {
    let dogs: [~Dog * 3] = [
        ~Dog { name: ~"Spot"   },
        ~Dog { name: ~"Fido"   },
        ~Dog { name: ~"Snoopy" },
    ];
    let winner: &Dog = dogs[1];  // note use of `&` to form a reference
    for dogs.each |dog| {
        println(fmt!("Say hello to %s", dog.name));
    }
    println(fmt!("And the winner is: %s!", winner.name));
} // <-- all dogs destroyed here

This code will now compile. Here, we convert winner into a reference, notated in Rust with &. You can take a reference to any smart pointer type in Rust by simply assigning it to a value with a reference type, as the let winner: &Dog = dogs[1] line does.

References (also known as borrowed pointers) don’t cause the compiler to free the data they refer to. However, they don’t prevent the compiler from freeing anything either. They have no effect on what smart pointers will do; regardless of how many references you have, a unique smart pointer will always free the data that it points to when it goes out of scope, and a managed smart pointer will always free its data when all managed smart pointers to the same allocation go out of scope.

This is important to keep in mind. Code like this will not compile:

fn foo() {
    let y: &int;
    {
        let x: ~int = ~2048;
        y = x;
    } // <-- x freed here
    println(fmt!("Your lucky number is: %d", *y)); // ERROR: accesses freed data!
}

In languages like C++, code like this could cause faults from attempting to access invalid memory. As it turns out, however, this piece of code won’t compile—the Rust compiler can and does prevent you from writing code like this at compile time. Essentially, the Rust compiler tracks where each reference came from and reports an error if a reference persists longer than the allocation it points into. This means that, generally speaking, you can use references all you like and have the confidence that they won’t result in hard-to-diagnose errors at runtime.

Conclusion

These ideas—smart pointers and references—form the basis of memory management in Rust. If you’re a C++ programmer, most of this will (hopefully!) simply have been an exercise in learning different syntax. For other programmers, these concepts are likely more foreign. But using these tools, you can write code with fine-grained control over memory, with improved safety over languages like C.

Which Pointer Should I Use?

| Comments

Deciding whether to use a managed @ pointer or an owned ~ pointer to allocate memory is one of the most frequent sources of confusion for newcomers to Rust. There are two main angles to consider when deciding whether to use an @ pointer or a ~ pointer in Rust: memory management and concurrency. I’ll cover each in turn.

Note that this tutorial only presents the basic system. There are many extensions to the system—borrowing, library smart pointers, cells, and so on—that allow the various limitations described here to be overcome. But this is the core system that needs to be understood first.

Memory management

One of the most important features of Rust from a systems programming perspective is that garbage collection is optional. What this means is that there are safe ways to allocate memory that do not require bookkeeping at runtime to determine when it is safe to free that memory.

What makes it possible for Rust programs to avoid runtime garbage collection is the notion of ownership of a particular allocation. Under this scheme, when the single owner of an allocation goes out of scope, the allocation is freed. Owned pointers in Rust are notated with ~. Here’s an example of their use:

struct Point {
    x: int,
    y: int,
}

fn f() {
    let x: ~Point = ~Point { x: 10, y: 20 };  // allocate a Point on the heap
}  // <-- x is freed here

Here, x is the single owner of the Point on the heap. Because there is only a single owner, Rust can throw away the memory pointed to by x at the end of the function.

The compiler enforces that there is only a single owner. Assigning the pointer to a new location transfers ownership (known as a move for short). Consider this program:

fn g() {
    let a: ~Point = ~Point { x: 10, y: 20 }; // allocate a Point on the heap
    let b = a;                               // now b is the owner
    println(b.x.to_str());                   // OK
    println(a.x.to_str());                   // ERROR: use of moved value
} // <-- b is freed here

When compiling this program, the compiler produces the error “use of moved value”. This is because assigning an owned pointer transfers ownership, making the old variable dead. Because the compiler knows precisely which variables are dead at all times, it can avoid having to determine at runtime whether to free the memory that a variable points to, and it can prevent you from accidentally accessing dead variables. However, this comes at a price: you are limited to using a single variable to refer to an ~ allocation.

By contrast, @ pointers do not have this limitation. We think of memory that is allocated with @ as owned by the garbage collector. You can make as many pointers to @ memory as you would like. There is a cost in runtime performance, but this cost comes with a great deal of flexibility. For example, the code above will compile with an @ pointer:

fn h() {
    let a: @Point = @Point { x: 10, y: 20 }; // allocate a Point on the heap
    let b = a;                               // a and b share a reference
    println(b.x.to_str());                   // OK
    println(a.x.to_str());                   // also OK
}

So, in short: @ pointers require garbage collection, but allow multiple pointers to the same location. ~ pointers avoid this GC overhead, but they don’t allow multiple pointers to the same location.

Concurrency

Another equally important aspect to the distinction between @ and ~ is that it ensures that concurrent Rust tasks don’t race on shared memory. To illustrate this, here’s an example of broken code that doesn’t compile:

struct Counter {
    count: int
}

fn f() {
    // Allocate a mutable counter.
    let counter: @mut Counter = @mut Counter { count: 0 };
    do spawn {               // spawn a new thread
        // Increment the counter.
        counter.count += 1;  // ERROR: attempt to capture an `@` value
    }
    println(counter.count.to_str()); // print the value
}

This code contains a classic race—if this code compiled, then the value printed would be either 0 or 1, depending on whether the counter.count += 1 line executed first or the println executed first. The key here is that two threads—the spawned thread and the main thread—are both simultaneously attempting to access the counter object. To prevent these errors, Rust prevents multiple threads from accessing the same memory at the same time.

Recall from the previous section that there can be any number of pointers to memory allocated with @. But there can be only one pointer to memory allocated with ~. This suggests a way to forbid multiple threads from accessing the same data: restrict the types of pointers that can be sent between threads to ~ pointers. And this is exactly what Rust does.

For any piece of ~-allocated memory, there is only one pointer to it, and that pointer is owned by exactly one thread. So there can be no races, since any other threads simply don’t have access to that memory. Let’s rewrite our example above using ~ to illustrate this:

fn g() {
    // Allocate a mutable counter.
    let mut counter: ~Counter = ~Counter { count: 0 };
    do spawn {               // spawn a new thread
        counter.count += 1;  // increment the counter
    }
    println(counter.count.to_str()); // ERROR: use of moved value
}

What’s going on here is that, by referring to counter inside the spawn block, the new thread takes ownership of the counter variable, and the counter variable becomes dead everywhere outside that block. Essentially, the main thread loses access to counter by giving it away to the thread it spawns. So the attempt to print the value on the screen from the main thread will fail. By contrast, this code will work:

fn h() {
    // Allocate a mutable counter.
    let mut counter: ~Counter = ~Counter { count: 0 };
    do spawn {               // spawn a new thread
        counter.count += 1;  // increment the counter
        println(counter.count.to_str()); // OK: `counter` is owned by this thread
    }
}

Notice that the data race is gone: this code always prints 1, because the printing happens in the thread that owns the Counter object.

The resulting rule is pretty simple. In short: @ pointers may not be sent from thread to thread. ~ pointers may be sent, and are owned by exactly one thread at a time. Therefore, if you need data to be sent, do not allocate it with @.

Conclusion (TL;DR)

So the distinction between @ and ~ is often confusing to newcomers, but it’s really quite simple. There are two main rules to remember:

  1. ~ only supports one pointer to each allocation, so if you need multiple pointers to the same data, use @. But @ requires garbage collection overhead, so if this is important to your application, use ~ wherever possible.

  2. Don’t use @ pointers if you need to send data between multiple threads. Use ~ instead.

Finally, I should note again that, if these rules are too restrictive for you (for example, if you need multiple pointers but can’t tolerate garbage collection pauses), there are more advanced solutions: borrowing, safe smart pointers, and unsafe code. But this simple system works well for many programs and forms the foundation of Rust’s approach to memory management.

The New Borrow Check in a Nutshell

| Comments

If you’ve used Rust for any period of time, you’ve probably been bitten by the mysterious borrow check—the compiler pass responsible for preventing iterator invalidation, as well as a few other dangling pointer scenarios. The current iteration of the borrow check enforces a fairly complex set of rules. Because the rules were hard to understand and ruled out too many valid programs, we were never really satisfied with the analysis; without a simple set of rules to follow, programmers will get frustrated and give up. To remedy this, Niko has proposed a revamp of the borrow checker known as “Imagine Never Hearing the Phrase ‘Aliasable, Mutable’ Again”. This has mostly been implemented in a pull request now, so I’d like to take the opportunity to explain the new rules. I’m particularly excited about this change because now the entire set of borrow check rules are simple enough to boil down to one principle.

Here’s the rule that the new borrow check is in charge of enforcing: Whenever you take a pointer to an object, you may not modify that object as long as that pointer exists, except through that pointer.

(Strictly speaking, this is not all the new borrow check enforces, but the other errors the pass can produce are generally straightforward and simple dangling pointer errors. Also, I’m omitting the rules related to &const, as this rarely-used type of pointer is likely to be removed.)

For unique pointers (~) and borrowed pointers (&), this rule is enforced at compile time, without any runtime overhead. Here’s an example:

let mut the_magic_word = Some(~"zap");
match the_magic_word {
    None => {}
    Some(ref word) {
        the_magic_word = None; // ERROR
        io::println(*word);
    }
}

Here, the line marked ERROR produces the error “assigning to mutable local variable prohibited due to outstanding loan”. This happens because we violated the rule above—the line the_magic_word = None mutates the value the_magic_word while there exists a pointer to it (word).

Another example:

struct Foo {
    array: ~[int]
}

impl Foo {
    fn bar(&mut self) {
        for self.array.each |i| {
            self.array = ~[];  // ERROR
            io::println(i.to_str());
        }
    }
}

Again, the error is “assigning to mutable field prohibited due to outstanding loan”. As before, it’s traceable to a violation of the mutation rule: the line self.array = ~[] mutates the self.array field while a pointer (i) into it exists.

This example is interesting for a couple of reasons. First of all, it illustrates the way the Rust compiler can catch iterator invalidation issues without runtime overhead in many cases: here the compiler is able to detect that the i iterator, which has type &int, was invalidated, and rejects the program instead of permitting undefined behavior at runtime. Second, this example illustrates something not possible under the current borrow check regime that the new borrow check allows: namely, taking an immutable pointer to a field accessible through a &mut pointer. (An immutable pointer is needed to call the each method to prevent iterator invalidation.) More than any other, this restriction probably led to the greatest number of borrow check errors in practice, since it prevented iterating over any collections reachable from &mut pointers.

Now all of this works fine for & and ~ pointers, but what about managed boxes (@)? It turns out that immutable @ boxes are easy to deal with; since they can’t be mutated at all, the borrow checker doesn’t have to do anything to enforce the no-mutation rule. However, for @mut boxes, the situation is more complicated. For @mut boxes, the new borrow checker inserts runtime checks to enforce the pointer rules. Attempting to mutate an @mut box while a pointer to its contents exists results in task failure at runtime, unless the mutation is done through that pointer.

Interestingly, this is similar to the way various debug or safe STL implementations (for example, Microsoft’s) guard against iterator invalidation. The differences are: (1) in Rust, the checks are automatically inserted by the compiler instead of built into each collection by hand; and (2) the checks are only needed for garbage collected data, as the compiler can perform the checks at compile time for other types of data.

There is one gotcha here, however. As implemented, if any pointer exists to any part of an @mut box, then the entire box cannot be mutated while that pointer exists. This means that this example will fail:

struct Dungeon {
    monsters: ~[Monster],
    total_gold: int
}

impl Dungeon {
    fn count_gold(@mut self) { // note `@mut self`, not `&mut self`
        self.total_gold = 0;
        for self.monsters.each |monster| { // pointer created here
            self.total_gold += monster.gold;
        }
    }
}

Note that the iterator variable monster has type &Monster. This is a pointer to the inside of Dungeon, so the assignment to self.total_gold violates the mutation rule. Unfortunately, the compiler does not currently catch this, so the program will fail at runtime.

There are a couple of workarounds. The simplest way is to change @mut self to &mut self. Since there is no need to give out the @mut pointer for this operation, this is safe. Roughly speaking, the compile-time checks operate on a per-field basis, while the runtime checks operate on a per-box basis. So this change makes the operation succeed. Another possibility is to make total_gold into a local variable and assign to the field after the for loop.

Despite the fact that this error is easy to fix, I’m concerned about the fact that the compiler won’t catch this kind of thing at compile time. So I think we should introduce a set of warnings that looks for common violations of this rule. It’s impossible to make the warnings catch all failures—that’s the reason why the check is done at runtime in the first place. (In general, trying to make the compiler reason about @ boxes is hard, since the compiler has no idea how many references to them exist.) But I suspect that we could make the analysis good enough to catch the majority of these errors in practice.

In any case, the take-away from all of this is that the borrow checker should be much easier and more transparent with this change. There’s essentially just one straightforward rule to remember.

The Two Meanings of “Impl”

| Comments

impl declarations in Rust have two forms. The subtle distinction between the two can be confusing at first, so I’ll briefly explain the difference here.

The first form of impl is a type implementation. (Earlier I was calling this an “anonymous trait”, but I think that this terminology is probably more confusing than it’s worth.) This form allows you to define new functions associated with a type. For example:

struct Dog {
    name: ~str
}

impl Dog {
    static fn new(name: ~str) -> Dog {
        return Dog { name: name };
    }

    fn speak(&self) {
        io::println("woof");
    }
}

This example defines new functions new and speak under the Dog namespace. Here’s an example of their use:

let dog = Dog::new("Snoopy");
Dog::speak(&dog); // note: doesn't work today, see note below

(The explicit call of the form Dog::speak(&dog) doesn’t work today, but I wrote it out to emphasize the fact that speak lives in the Dog namespace. It’s likely to work in the future, though. Today, you need to write dog.speak().)

The second form of impl, on the other hand, is a trait implementation. It’s distinguished from the first form by the presence of a : followed by the name of a trait. This form allows you to provide an implementation for one or more existing functions belonging to a trait. It doesn’t define any new functions. For instance, suppose I defined this trait:

trait Animal {
    static fn species(&self) -> ~str;
}

Then I can supply an implementation of species() for my Dog structure like this:

impl Dog : Animal {
    static fn species(&self) -> ~str {
        return ~"Canis lupus familiaris";
    }
}

The key point to notice here is that this form doesn’t define any new names. This code won’t compile:

let dog = Dog::new("Fido");
io::println(Dog::species(&dog)); // unresolved name: `species`

But this code will:

let dog = Dog::new("Spot");
io::println(Animal::species(&dog));

The reason is that a trait implementation only provides the implementation of one or more existing functions rather than defining new functions. The function species is part of the Animal trait; it’s not part of Dog.

(You might reasonably ask: Why not duplicate the name species into Dog, for convenience? The reason is because of name collisions: it should be possible to implement Animal and later implement another trait with a different function called species without breaking existing code.)

So the upshot of this is that there are two forms of implementations in Rust: the type implementation, which defines new functions, and the trait implementation, which attaches functionality to existing functions. Both use the impl keyword, but they’re different forms with different meanings.

A Tour of Vector Representations

| Comments

One aspect of Rust that’s often confusing to newcomers is its treatment of strings and vectors (also known as arrays or lists). As a result of its focus on systems programming, Rust has a somewhat lower-level concept of a vector than most other languages do. As part of an overall goal to make Rust easy to understand, I thought I’d write up a quick tour of the way other languages’ vectors work from the perspective of the machine in order to make it easier to map these concepts into Rust.

There are three common models that I’ve observed in use—for lack of better terminology, I’ll call them the Java model, the Python model, and the C++ STL model. (For brevity, I’ve omitted fixed-size, stack-allocated arrays, since these are very limited.) Most languages build upon one of these three. In a subsequent blog post, I’ll explain how Rust’s system differs from these and how the programmer can build the equivalents of each of these models in Rust.

We’ll start with the Java model. Java’s basic array type has a fixed size when created and cannot be changed afterward. Arrays in Java are always allocated on the Java heap. For example, consider the following line of code:

int[] a = { 1, 2, 3, 4, 5 };

After this code is executed, the memory of the running program looks like this:

image

The cell highlighted in red is the value of type int[]. It’s a reference type, which means that it represents a reference to the data rather than the data itself. This is important when assigning one array value to another. For instance, we execute this code:

int[] b = a;

And now the memory looks like this:

image

Both values are pointing at the same underlying storage. We call this aliasing the array buffer. In Java, any number of values can point to same the underlying array storage. Because of this, the language has no idea how many pointers point to the storage at compile time; therefore, to determine when to clean up the storage, Java uses garbage collection. Periodically, the entire heap is scanned to determine whether any references to the array storage remain, and if there are none, the buffer is freed.

Now this model is simple and fast, but, since the arrays have a fixed size, the programmer can’t add new elements to them once they’re created. This is a very common thing to want, so Java provides another type, java.util.ArrayList, for this. As it turns out, the model used by Java’s ArrayList is essentially the same model that Python uses for all of its lists.

Let’s look at this model more closely. Consider this statement in Python:

a = [ 1, 2, 3, 4, 5 ]

Once this is executed, the memory looks like this:

image

As in Java, the cell highlighted in red (a) is the value that actually has the Python type list. We can see this if we assign a to b:

b = a

image

Obviously, the disadvantage of this model is that it requires two allocations instead of one. The advantage of this model is that new elements can be added to the end of the vector, and all outstanding references to the vector will see the new elements. Suppose that the vector had capacity 5 when initially created, so that no room exists to add new elements onto the end of the existing storage. Then when we execute the following line:

b.append(6)

The memory looks like this:

image

Here, Python has created a new and larger allocation for the storage, copied the existing elements over, and freed the old allocation (indicated in gray). Because a and b both point to the PyListObject allocation, which has not changed, they both see the new elements:

>>> a
[1, 2, 3, 4, 5, 6]
>>> b
[1, 2, 3, 4, 5, 6]

In summary, Python’s model sacrifices some efficiency at runtime because it requires both garbage collection and two allocations, but it gains flexibility by permitting both aliasing and append operations.

Turning our attention to the C++ STL, we find that it has a different model from both Python and Java: it sacrifices aliasing but retains the ability for vectors to grow. For instance, after this C++ STL code executes:

std::vector a;
a.push_back(1);
a.push_back(2);
a.push_back(3);
a.push_back(4);
a.push_back(5);

The memory looks like this:

image

As before, the red box indicates the value of type std::vector. It is stored directly on the stack. It is still fundamentally a reference type, just as vectors in Python and Java are; note that the underlying storage does not have the type std::vector<int> but instead has the type int[] (a plain old C array).

Like Python vectors, STL vectors can grow. After executing this line:

a.push_back(6);

The STL does this (assuming that there isn’t enough space to grow the vector in-place):

image

Just as the Python list did, the STL vector allocated new storage, copied the elements over, and deleted the old storage.

Unlike Java arrays, however, STL vectors do not support aliasing the contents of the vector (at least, not without some unsafe code). Instead, assignment of a value of type std::vector copies the contents of the vector. Consider this line:

std::vector b = a;

This code results in the following memory layout:

image

The entire contents of the vector were copied into a new allocation. This is, as you might expect, a quite expensive operation, and represents the downside of the C++ STL approach. However, the STL approach comes with significant upsides as well: no garbage collection (via tracing GC or reference counting) is required, there is one less allocation to manage, and the vectors are allowed to grow just as Python lists are.

This covers the three main vector representations in use by most languages. They’re fairly standard and representative; if I didn’t mention a language here, it’s likely that its implementation uses one of these three techniques. It’s important to note that none of these are right or wrong per se—they all have advantages and disadvantages. In a future post, I’ll explain the way Rust’s vector model allows the programmer to choose the model appropriate for the task at hand.