Over the past week I’ve submitted a series of pull requests that significantly revamp the way parallel layout works in Servo. Originally I did this work to improve performance, but it’s also turned out to be necessary to implement more advanced CSS 2.1 features. As it’s a fairly novel algorithm (as far as I’m aware) I’d like to take some time to explain it. I’ll start with where we are in Servo head and explain how it evolved into what’s in my branch. This post assumes a little knowledge about how browser engines work, but little else.
Servo’s layout operates on a flow tree, which is similar to the render tree in WebKit or Blink and the frame tree in Gecko. We call it a flow tree rather than a render tree because it consists of two separate data types: flows, which are organized in a tree, and boxes, which belong to flows and are organized in a flat list. Roughly speaking, a flow is an object that can be laid out in parallel with other flows, while a box is a box that must be laid out sequentially with other boxes in the same flow. If you’re familiar with WebKit, you can think of a box as a
RenderObject, and if you’re familiar with Gecko, you can think of a box as a
nsFrame. We want to lay out boxes in parallel as much as possible in Servo, so we group boxes into flows that can be laid out in parallel with one another.
Here’s a simple example. Suppose we have the following document:
<html> <body> <p>Four score and seven years ago our <b>fathers</b> brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.</p> <p>Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to <i>dedicate</i> a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.</p> </body> </html>
This would result in the following flow tree:
Notice that there are three inline boxes under each
InlineFlow. We have multiple boxes for each because each contiguous sequence of text in the same style—known as a text run—needs a box. During layout, the structure of the flow tree remains immutable, but the boxes get cut up into separate lines, so there will probably be many more boxes after layout (depending on the width of the window).
One neat thing about this two-level approach is that boxes end up flattened into a flat list instead of a linked data structure, improving cache locality and memory usage and making style recalculation faster because less needs to be allocated. Another benefit (and in fact the original impetus for this data structure) is that the line breaking code need not traverse trees in order to do its work—it only needs to traverse a single array, making the code simpler and improving cache locality.
Now that we know how the flow tree looks, let’s look at how Servo currently performs layout to figure out where boxes should go.
The current algorithm
The current algorithm for parallel layout in Servo (i.e. what’s in the master branch before my changes) consists of three separate passes over the flow tree.
Intrinsic width calculation or
bubble_widths(bottom-up). This computes the minimum width and preferred width for each flow. There are no sequential hazards here and this can always be computed in parallel. Note that this is information is not always needed during layout, and eventually we will probably want to implement optimizations to avoid computation of this information for subtrees in which it is not needed.
Actual width calculation or
assign_widths(top-down). This computes the width of each flow, along with horizontal margins and padding values.
Height calculation or
assign_heights(bottom-up). This computes the height of each flow. Along the way, line breaking is performed, and floats are laid out. We also compute vertical margins and padding, including margin collapse.
Within each flow, boxes are laid out sequentially—this is necessary because, in normal text, where to place the next line break depends on the lines before it. (However, we may be able to lay boxes out in parallel for
white-space: nowrap or
For simple documents that consist of blocks and inline flows, Servo achieves excellent parallel wins, in line with Leo Meyerovich’s “Fast and Parallel Webpage Layout”, which implemented this simple model.
The problem with floats
Unfortunately, in the real world there is one significant complication: floats. Here’s an example of a document involving floats that illustrates the problem:
<div style="float: right"> I shot the sheriff. But I swear it was in self-defense. I shot the sheriff. And they say it is a capital offense. </div> <div> I shot the sheriff But I didn't shoot no deputy. </div> <div> All around in my home town, They're tryin' to track me down; They say they want to bring me in guilty For the killing of a deputy, For the life of a deputy. </div>
Rendered with a little extra formatting added, it looks like this:
The flow tree for this document might look like this:
Notice how the float in green (“I shot the sheriff…”) affects where the line breaks go in the two blocks to its left and below it (“I shot the sheriff…” and “All around…”). Line breaking is performed during the height assignment phase in Servo, because where the line breaks go determines the height of each flow.
This has important implications for the parallel algorithm described above. We don’t know how tall the float is until we’ve laid it out, and its height determines where to place the line breaks in the blocks next to it, so we have to lay out the float before laying out the blocks next to it. This means that we have to lay out the float before laying out any blocks that it’s adjacent to. But, more subtly, floats prevent us from laying out all the blocks that they impact in parallel as well. The reason is that we don’t know how many floats “stick out” of a block until we know its height, and in order to perform line breaking for a block we have to know how many floats “stuck out” of all the blocks before it. Consider the difference between the preceding document and this one:
The only difference between the first document and the this one is that the first unfloated block (“I shot the sheriff…”) is taller. But this impacts the height of the second block (“All around…”), by affecting where the lines get broken. So the key thing to note here is that, in general, floats force us to sequentialize the processing of the blocks next to them.
The way this was implemented in Servo before my pull requests is that any floats in the document caused all unfloated blocks to be laid out sequentially. (The floats themselves could still be laid out in parallel, but all other blocks in the page were laid out in order.) Unsurprisingly, this caused our parallel gains to evaporate on most real-world Web pages. The vast majority of modern Web pages do use floats in some capacity, as they’re one of the most popular ways to create typical layouts. So losing our parallel gains is quite unfortunate.
Can we do better? It turns out we can.
As most designers know, the
float property comes with a very useful companion property:
clear property causes blocks to be shifted down in order to avoid impacting floats in one or both directions. For example, the document above with
clear: right added to the second block looks like this:
This property is widely used on the Web to control where floats can appear, and we can take advantage of this to gain back parallelism. If we know that no floats can impact a block due to the use of the
clear property, then we can lay it out in parallel with the blocks before it. In the document above, the second block (“All around…”) can be laid out at the same time as the float and the first block.
My second pull request implements this optimization in this way: During flow construction, which is a bottom-up traversal, we keep track of a flag,
has_floated_descendants, and set it on each flow if it or any of its descendants are
FloatFlow instances. (Actually, there are two such flags—
has_right_floated_descendants—but for the purposes of this explanation I’ll just treat it as one flag.) During width computation, we iterate over our children and set two flags:
impacted_by_floats. (Again, there are actually two such flags—
impacted_by_floats is true for a flow if and only if any of the following is true:
The parent flow is impacted by floats.
The flow has floated descendants.
Any previous sibling flow is impacted by floats, unless the appropriate
clearproperty has been set between this flow and that sibling.
Only subtrees that have
impacted_by_floats set to true are laid out sequentially, in order. The remaining subtrees can be laid out in parallel.
With this optimization implemented, documents above can be laid out in parallel as much as possible. It helps many real-world Web pages, as
clear is a very commonly-used property.
At this point, two questions arise: “Can we do even more?” and “Is this algorithm enough to properly handle CSS?” As you might expect, the answer to the first is “yes”, and the answer to the second is “no”. To understand why, we need dive into the world of block formatting contexts.
Block formatting contexts
The behavior of
overflow: hidden is subtle. Consider this document, which is identical to the document we’ve been using but with
overflow: hidden specified on the blocks adjacent to the float:
<div style="float: right"> I shot the sheriff. But I swear it was in self-defense. I shot the sheriff. And they say it is a capital offense. </div> <div style="overflow: hidden"> I shot the sheriff But I didn't shoot no deputy. </div> <div style="overflow: hidden"> All around in my home town, They're tryin' to track me down; They say they want to bring me in guilty For the killing of a deputy, For the life of a deputy. </div> </div>
Rendered, it looks like this: