Recently I’ve been describing how Cairo Graphics renders 2D vector graphics for, amongst other things, GTK & WebkitGTK. Today I think I’ll summarise and publish what I’ve studied so far, and maybe move onto the “compositor”.
The API focuses on a cairo_t (Cairo.Context in language bindings) object, which is constructed by a given Surface object. And typically the Context objects which translates the caller’s method calls into datastructures it passes back to the Surface.
The GraphicsState backs any getters/setters. Implementing .save() & .restore() using a linked-list stack, and a dedicated per-context freelist for efficiency.
The path is encoded as two “segmented arrays”, one listing opcodes and the other listing the control points for those opcodes. These “segmented arrays” are a combination between linked-lists and arrays, where whenever an array overflows a new array twice is allocated and added to the linked-list to store the subsequent opcodes/points.
I’ve now studied the next component: Cairo Image Surface is the backend which translates the vector graphics method calls into a bitmap image.
This stores some basic metadata and dispatches methods to “Compositors”, falling back upon error or missing methods to underlying “delegates”.
Around this it’s get logic for tracking the “damage” rectangles the window manager might want to know in order to efficiently composite it onscreen.
This dirtying logic estimates the damage from each operation, before letting the dynamically-dispatched methods finetune it. After which it adds the damage to the Surface’s “damage” property, which itself is an array of rectangles.
That estimation pass also doubles to determine whether the rendering will actually have any effort, and as such isn’t worth computing further.
The implementation is split into three files: cairo-composite-rectangles, cairo-damage, & cairo-rectangle.
The topmost “compositor”, the “spans compositor”, handles “painting” (filling whole rectangles) and “masking” (setting the clip polygon) the same, starting by accessing the configured clip and using normal dirtying routines to apply that clip. This clip is provided as a path, but is converted into a “polgon” and then (via a “scan converter”) a boxes array.
When it comes to strokes and fills, the spans compositor calls another file to convert the paths into “polygons” or “boxes” by iterating over the contained opcodes. I’ll cover these tomorrow.
All of this code is complicated by Cairo trying to figure out which shortcuts it can take, but once the inputs are converted into polygons and boxes (with the clippath applied) it calls the optimal methods of it’s fallback implementation.
One path interpretor computes the bounding box (called upon initializing a Path). For the most part this just involves extending that box to hold each point, but curves are more complex and requires the current point to be tracked.
To translate the paths into polygons for fill operations (called by GState utilities & compositors including Spans), each subpath (seperated by MOVETO opcodes) is closed and curves are interpolated into their own LINETO ops.
To determine whether a given point is inside the shape described by a path (called by GState utilities), it counts the number of line segments directly to the left and the ones directly to the right. If the difference is even the point is inside.
In some cases paths to be stroked can be almost directly translated into an array of boxes to be filled. An intermediate array is used here so some nuances can be filled in once the subpath is closed, including a seperate codepath to split these lines into seperate boxes for a dash-pattern. There’s a flag used to determine for the Spans and other Compositors whether that more optimal codepath can be used.
Converting the path to a polygon (called by GState utilities, the Spans Compositor, etc) is more involved, using two “contour” intermediate datastructures (for both sides of the line) and mostly focusing on line butts.
There’s a very similar codepath which converts Cairo Graphics paths to be stroked into an array of trapagons, which is a slightly more straightforward conversion. As per triangles appropriate to be to be submitted to the GPU. Both of which are called by the Traps Compositor.
Or you can call a polymorphic (called by GState utilities, the Recording Surface, is a fallback for stroke-to-polygon in case of dashes, etc) version of this code.
Cairo “splines” (curve interpolations) are in part an abstraction around Cairo slopes, with some additional geometric formulas and a divide&conquer formula for splitting that curve into line segments at a given tolerance.
The slope itself is just a structure containing dx & dy.
Contours are used as an intermediate datastructure between paths to be stroked and polygons, represented as a segmented array of points. It’s main purpose appears to be allow discarding details too fine to be rendered, which is done in 3 passes.
The first pass discards points too close together. The second discards points that don’t make a difference to the slope it’s on. And the final passes actually removes those points, rather than marks them for removal.
That second pass is the most interesting and bears more explanation, being a divide & conquer algorithm.
The idea is to find the point in a path segment that is furthest from it’s straight line approximation. If it visually lands on that straight line approximation, we don’t actually need those intermediate points and can replace it with that approximation.
Otherwise we should split that path segment around that point and perform the same process for both.
Polygons are a common lower-level datastructure than Paths in Cairo Vector Graphics, storing a (non-segmented) array of (non-horizontal) “edges” and boxes to which they should be clipped. These edges in turn consist of two points, a top and bottom, and a direction, and enough space for 32 of them are allocated alongside each polygon.
Most of the logic is dedicated to that clipping, but intersection tests and reductions are done using Cairo’s central Bently-Ottman algorithm I’ll cover later.
Alternatively Cairo may lower paths to a segmented array of boxes (with it’s own clip boxes and Bently-Ottman intersection routine), array of trapazoids (focusing clipping and line butts), or array of triangles (focusing entirely on clipping).
To render the polygons it computes for stroked or filled polygons, the Cairo Vector Graphic’s Spans Compositor uses an algorithm from one Tor Andersson. It seems more suitable for smaller shapes (especially opposed to what I’ll cover tomorrow), but I’m struggling to see the fallback conditions.
Put simply this Bucket (or Heap) Sorts the Y axis before it, merging in data derived from previous rows, Merge Sorts the X axis. Which easily gets converted into a Run-Length-Encoded bitmap.
To start, the Spans Compositor “adds a (Cairo) polygon” to this “Scan Converter”. Which will then procede to convert each contained (non-horizontal) edge into it’s own internal datastructure and (in what I referred to as Bucket Sort) store it at the index given by it’s topmost Y-axis point.
This internal datastructure is arranged so p1 is always above p2, whilst storing the original order in a “dir” property. They’re allocated from a scan-converter specific “pool”.
Then upon being told to “generate” the “scanlines”, the Scan Converter will iterate over every relevant Y coordinate, checking if any lines start at that Y line. In doing so it’ll bucket the lines for anti-aliasing (which becomes part of the check) and extract the minimum height and whether all lines are vertical.
If there are lines starting at the top of those pixels, that linked list will be Merge Sorted and merged with with previously sorted “active list”.
After taking a shortcut to skip blank lines, it checks if there’s any lines ending, (starting,) or intersecting on this row. If so it has take more care with anti-aliasing and examine the other sub-buckets.
To do so it iterates over every subrow whilst updating the activelist to reflect the next one. At the sametime it computes a “winding factor” (as a running sum of line directions) to determine which line gaps should be added to the Run Length Encoded output with computed “coverage”.
Otherwise it can take some shortcuts, including making each span longer (skipping some lines) and using formulas rather than sampling to compute anti-aliasing. If the lines are vertical, it can take a further shortcut back to this bit of code.
The output from these passes is a linked list (with an embedded iterator) of “cells”, each tracking an X coordinate and data from which to compute subpixel coverage for antialiasing. It has it’s own allocation pool, with some preallocated memory.
Once the cell list has been gathered, it’s finalized into the run-length encoded opacity bitmask and handed off to the compositor. Much of this computation is skipped if it’s configured not to do anti-aliasing.
There’s multiple reimplementations of this scan converter, optimized for different uses. Which are choosen based on whether it can be constrained to boxes, needs clipping, or the anti-aliasing level.
Then another is used by the Direct Rendering Media surfaces.
To merge two sorted lists you repeatedly check which list starts with the greater (by some definition, in this case the top X coordinate) item and move it into the sorted list. This is how Cairo’s Scan Converters combine new rows of lines with it’s active list.
You can sort a list by recursively dividing the list in half, and using this Merge operation to combine both results.
Cairo’s implementations replaces most of this recursion with loops for efficiency, but it’s still Merge Sort.
I’ve mentioned allocation pools throughout this section, so they’re worth explaining further.
They serve to throttle memory allocations thereby addressing any bottlenecks it introduces. Instead it allocates out of “pool chunks”, the first of which is often allocated alongside the containing datastructure.
A pointer is incremented to allocate each new object, which’ll all be freed together once computation has completed via a linked list of those pool chunks.
Shape Mask Compositor
The Cairo Shape Mask Compositor very simply renders into a new scratch surface and applies the mask to that after-the-fact as it blends the image into the main surface. Both of these methods are dynamically dispatched with performance checks.
In the case of Image Surfaces, creating the scratch surface involves cloning the metadata and initializing a new bitmap image. And the masking is handed off to the compositors, the first of which will treat it like a “paint” operation.
Like with the Spans Compositor, “painting” (filling rectangles) mostly involves handling the complexities introduced by the configured crop shape. And “masking” is mostly the same as “painting” (again like with Spans), but this time it may first convert the pattern to a surface.
To fill a path: If the “rectilinear” flag for a path is set, then the Trapagons Compositor will composite it in as boxes. Otherwise it’ll convert the path into a polygon, which it’ll then crop, attempt to convert into boxes for rendering, and failing that convert into trapagons.
It’ll then try again to render those trapagons as boxes, before converting the given Pattern into a Surface and calling the most optimal underlying method.
To stroke a path, the Trapagons Compositor will attempt to composite it as boxes if the flag is set. Then for certain antialias levels in the presence of curves, it’ll attempt to composite it as a polygon. Otherwise it’ll convert the Path straight into trapagons.
Those trapagons will then be clipped, attempted to be rendered as boxes, and finally composited as trapagons using the most optimal method.
Seemingly just for the sake of other compositors, there’s a method to render text “glyphs”, but here it just calls some callback methods and composites it in.
Image Surfaces will then call Cairo’s font infrastructure (which I guess I’ll cover later), most dominantly ScaledGlyph, and composite it in via PixMan. Possibly utilizes it’s “glyph cache” if available.
The Bentley-Ottmann algorithm was designed for finding every intersection amongst a set of line segments, but Cairo Vector Graphics uses it to:
- Convert polygons into a set of trapagons for rendering, by pairing up line segments.
- Simplify complex geometries (whether formed out of boxes or line segments).
- Compute the intersection between two complex polygons.
This morning I’ll describe how this algorithm works.
P.S. Interestingly the original paper worked around many off-by-one errors.
Bentley-Ottman works by queueing up events (in a Priority Min Queue) for each line’s start. Upon a START event, it adds it to the Sweep Line, enqueues new STOP events, special cases line continuations, and the next line INTERSECTIONs on the left and/or right if present.
Upon STOP it removes the line from the Sweep Line and checks if it’s neighbouring lines have an INTERSECTION. And upon INTERSECTION, it swaps the order of the two lines in the Sweep Line and checks for the next INTERSECTION.
That Priority Queue is implemented as a Min Heap (merged with the sorted list received as input). This is an “implicit” binary tree where every node contains a value lower then (here defined as higher on the screen) both of it’s children.
It’s implicit in the sense that actual pointers to the children aren’t stored, but instead are computed from the node’s index in an array, making it fast to also the last node in that array.
To enqueue an item onto a min-heap, you first add it to the end of the array before reestablishing the invariants by swapping it with as many parents as necessary.
To peek at the minimum value you just look at index 1 (index 0 is unused). And to remove it you overwrite it with the last item in the array before repeatedly swapping it’s with it’s lesser child until the invariants are reestablished.
Both of which completes in O(log(n) time, and typically the children are placed at index 2x & 2x+1.
That Sweep Line meanwhile is an ordered doubly-linked list, with a cached iterator acting as a hueristic favouring line continuations without slowing down random orderings. Thereby making it obvious to the computer which line segments will next intersect.
The lines are stored in the order in which they intersect the current Y coordinate.
And Doubly-Linked Lists stores the sequence by storing pointers to both the next and previous items in the list.
To extract the output into trapagons, Cairo examines the Sweep Line whenever the dequeued event is on a new Y coordinate. It starts this by examining any lines that have ended, which indicates that a trapagon needs to be finalized and added to the output.
Otherwise it computes a running sum of line “directions” (the “winding factor”) to determine which line segments it should consider. Data is stored pairing up those line segments, and possibly outputting any completed trapagons.
As for where Cairo uses Bentley-Ottman:
- Most compositors will use it to compute & simplify the boxes to which it should clip paths
- It’s used to apply clips, whether or not those can be converted efficiently to boxes
- Some entrypoints are only called by X11 surfaces
- There’s one combination of Bentley-Ottmann & Tor algorithms used by Direct Rendering Media surfaces
- It’s used to compute the boxes or trapagons that should be used, possibly via the Path Fill interpretor.
The compositors choose dynamically-dispatched methods provided by the Cairo Image Surface Compositor, which will then hand the task off onto the PixMan library.
PixMan at it’s simplest iterates over each pixel in these simplified geometries and sets them to a value computed via dynamic dispatch.
What makes pixman valuable as a seperate library is that it heavily micro-optimizes these operations (it can’t macro-optimize them because that would involve finding a more efficient encoding than the monitor expects), by making heavy use of machine code.
I’m not that familiar with assembly-level optimization, but I can say it uses any available vector instructions. And the dynamic probably doesn’t hurt too much as it plays well with the branch target predictor, and aids a smaller code size.
In general there appears the dynamic dispatch appears to go through a fallback chain starting with a noop and falling back to:
- MIPS (DSPR2, MMX), PowerPC (VMX), ARM (Neon, MMX, SIMD), or x86 (SSSE3, SSE2, MMX) as determined by CPU flags.
- Possibly a “fast path”.
- A general path.
For filling a rectangle with a differently-encoded image, PixMan hands it directly off to it’s implementations.
The MIPS DSPR2 implementation requires the src and dst bits-per-pixel to be the same, and wraps it’s own memcpy for each line of the src image. This memcpy variant, amongst other things, copies data over in increasingly smaller registers and has a slow path for unaligned data.
MMX is similar, except it loads 8 bytes into registers before writing them back out for the memcpy.
PowerPC VMX nor X86 SSSE3 don’t have a “blt” operator.
The ARM Neon assembly-level code is a lot more obscured, but it has similar restrictions and seems to be mostly worrying about the data prefetcher. ARM SIMD is similar, but it seems less worried about prefetching data but rather using the largest registers available.
And X86 SSE2 is mostly written in C and mainly worries about memory alignment, and tries to restore it.
Failing to perform the “blt” operator, Cairo will instead use a “composite32” operator with a NULL “mask”.
Upon validating these inputs, it begins by computing a “composite region” to handle any clipping. Then a function is looked up (by iterating over and caching the options) for the blend mode (opaque images lead to a faster blend mode being chosen), and called for every row in the src image.
At a quick glance, these methods work a lot, but not quite, like the memcpys described earlier.
Alternatively those rects can be filled with a solid color, which PixMan’s “implementations” provides an optimized codepath for. Though it doesn’t support opacity, essentially just a memset variant.
This operation appears to be significantly easier on the backends as it doesn’t need to optimize read as well as write, but still I see many of the same optimizations and codestyles.
Trapazoids may be easier for Cairo Vector Graphics to generate, but it’s harder for PixMan to composite in.
Pixman implements it by walking over the Y axis, and all X points between the two bounding lines. From this it generates an opacity mask that can be filled using the “composite32” operator I described earlier. Though it possible to render multiple trapazoids into this image before using it as a composition mask.
There’s multiple versions of this function for different bitwidths.
PixMan also includes a Glyph Cache for text rendering, and shortcuts through the normal compositing routines. The Run-Length Encoded scanlines Cairo normal lowers strokes (via “polygons”) to are handed to PixMan as 1px high rectangles.
And PixMan provides getter and setter iterators constructed via it’s fallback “implementations” and optionally (as decided by a C macro) it’s “image” class, by which it implements various graidents.
cairo_t type is a reference-counted abstract class with a field to check for any errors. The caller are responsible for memory managing it.
If you’re not serializing the operations into an open standard vector graphics data format, you’ll typically use the
This tracks a
cairo_path_fixed_t and a linked-stack of
cairo_gstate_t to store property values, preallocating 2 & tracking a freelist, to pass as input to the constructing cairo_surface_t.
I’ll go over how all the
cairo_gstate_t properties impact rendering later (it deserves some dedicated focus), and discuss fonts on a seperate page. They’re partially freed on pop, and fully freed with their containing
But interestingly all the stroke styles are split out into their own inline structure to be passed around as a group, though it doesn’t contain any interesting types.
It also has 3 (inline) matrices, 3 “target” surfaces, a “source” pattern, and a clip.
cairo_matrix_t is a structure of 6 doubles (xx, xy, x0, yx, yy, y0) for the formulas x’ = xxx + xyy + x0 & y’ = yxx + yyy + y0. An “inverse” matrix is kept to make it easy to reverse this computation.
cairo_pattern_t contains several enum types, a matrix, a linked list of observers, and additional fields determined by it’s
cairo_pattern_type_t enum. Which may include (an array) of colours, points, and/or circles. Or a surface. These translate directly to PixMan Image objects.
cairo_clip_t (as assigned to a cairo_gstate_t) contains multiple representations of itself. There’s:
- A bounding box
- A linked-stack of reference-counted structure similar to
- An array of boxes (don’t ask me why boxes and rectangles are different things),
cairo_region_t(i.e. a reference counted PixMan region32 with an error code).
& a preallocated box.
It’s managed by it’s containing surface, while “compositors” often free their own copies.
On the otherhand
cairo_path_fixed_t (as referenced by
cairo_default_context_t) is mainly an array of
CLOSE_PATH opcodes, with the point operands moved out-of-line to it’s own array.
It also tracks the last point and the last
MOVE_TO point, a bunch of flags to determine whether shortcuts can be taken, and the bounding extents as a
cairo_default_context_t will free this when you start a new path or when it itself is freed.
Now moving onto Cairo Vector Graphic’s backend, there’s
cairo_gstate_t keeps a reference to it’s instantiating
cairo_surface_t is an abstract class class containing not only a refcount, error code, IDs, and dispath method table, but also:
- some bit flags to enable shortcuts,
- a matrix, it’s inverse, and a list of observers,
- some user data (from the user and input file),
- the resolution (as doubles) and fallback resolution,
cairo_surface_t, possibly in a linked-list,
- & some font options.
cairo_device_t is an abstract class allowing you to lock/unlock output device and flush data to it.
cairo_content_t meanwhile is a simple enum color &/or alpha.
The damage is tracked with an error code, as a
cairo_region_t and as a segmented list of boxes (some of which are preallocated).
Typically to get something you can display onscreen you use the cairo_image_surface_t subclass, which serves to wrap a PixMan Image and a (hardcoded) along with format metadata and a fallback linked-list of “compositors”.
This is constructed and managed by the caller, and receives as input the data built up by a
The compositors then have two methods for rendering boxes, stroke & fill methods, and (seamingly for the sake other surfaces) rendering glyphs.
cairo_image_surface_t uses are
cairo_spans_compositor_t, falling back to
cairo_traps_compositor_t. These don’t have any fields of note, but they do allow the caller to set some callback methods for it to output to.
These will then use various interpretors to lower the paths (either stroked or filled) into
cairo_polygon_t possibly via
cairo_contour_t, and on into a
cairo_scan_converter (taking a
Of these intermediate datastructures you have segmented lists and arrays of (typically with some preallocated space):
- points (
- lines with two points, top, bottom, & direction (
- boxes consisting of two points (
- trapazoids with a top & bottom, left & right line (cairo_traps_t) - NOTE: it’s easier to convert strokes directly to this, but harder on PixMan.
The compositors are responsible for freeing any it has computed for it.
There’s additional structures (including
cairo_pen_t) involved in converting between these until you ultimately have trapazoids, boxes, or spans (which are more efficient when they render immediately upon computation) PixMan can easily handle.
But that pretty much covers everything in Cairo, and don’t ask me how Cairo determines whether to use linked-lists, arrays, or segmented arrays to represent lists. It’s not evident in the code.
Let’s start with the
op(erator) property, which is passed almost directly (with trivial optimizations and conversions) to the underlying graphics library (like PixMan) via the backend Surface and Compositors upon performing a stroke/fill/glyph/mask/paint rendering.
opacity isn’t fully implemented, breaks rendering for any value other than 1.0, and the public API is commented out.
tolerance is used in determine whether dash styles will not be visable, hit testing (possibly via constructing the polygon), stroke/fill, and stored in the clips. In turn it’s used in interpolating curves to determine the step size has gotten too small, and to discard points too close together.
antialias is used in stroking/filling, computing new clips alongside which it’s stored, and computing rectilinear fill/stroke extents to determine whether the extents to should be rounded. When stroking/filling it’s used to determine which Tor Scan Converter, or various aspects around the Bentley-Ottmann rasterizer including the PixMan rendering.
On-stack copies of the
stroke_style is ultimately used in interpreting stroked paths, whether it’s rendered, hit-tested, or computing extents.
- 1/2 of
line_widthwill be added to either side of the path to draw two lines comprising the output shape, whether it’d be boxes, polygons, trapezoids, or triangles. Usually stored aside in a
line_capdetermines which codepath is used upon
MOVE_TOor CLOSE to join those two lines.
line_joindetermines which codepath is used to join a
cairo_pen_t’s points to each of those two lines.
miter_limitkicks in when
line_joinis set to
CAIRO_LINE_JOIN_MITER, and caps how far the top of the pyramid can reach, otherwise it’s treated like
dasha double array with a stored “offset” which is cyclically iterated over, with help from
cairo_stroker_dash_t, to split those lines up until multiple shapes.
fill_rule is used upon filling, fill/clip hittesting, computing fill extents, and computing new clips. It’s ultimately used in the hittesting/intersection path interpretors and the Tor & Bentley-Ottmann rasterization algorithms.
font_face is used to construct a
scaled_font which in turn is queried for it’s extents or path with or without specific glyphs, and used to render glyphs either directly or via the target surface.
previous_scaled_fontis no longer used.
font_matrixis used to construct that
scaled_fontand to geometrically transform the positioning of any glyphs before they are rendered.
font_optionsis only used to construct the
clip (which is unusually as it’s only ever added to, never replaced) is used in all the underlying Surface operations after a quick shortcircuiting test. Or you can hittest the clip directly or access it’s properties.
target is what receives all those backend method calls to do the actually rendering, rather than simply queueing up data for later use. When setting it stores aside the
original_target for no other reason than to let you access them.
It registers a callback to be updated of when the
target’s matrix changes so it can flag whether it’s still an identity transform, and stores the linked-list node in
device_transform_observer for later update/removal.
ctm stores the matrix transform, for which cairo_t (including GState) provides multiple shorthand methods. It’s passed (combined with the “target”’s matrix) to all method calls to
target, and used to evaluate whether dashes are visible as well as hittesting.
ctm_inverse has all the same operations applied to it as
ctm but in reverse (or it directly computes inverse of
ctm when that’s manually set”) and it’s used in stroking paths for computing dash distances.
source_ctm_inverse is copied over from
ctm_inverse when a new
source is set, to be applied to that
source upon rendering. That transformed version of
source is passed to all rendering methods on the “target”.