Glasgow Haskell Compiler

GHC is the production-quality implementation of Haskell, the lazy pure-functional programming language named after mathematician Haskell Curry. I love it’s pattern matching syntax!

Parsing

Parsing is done with the aid of Happy & Alex.

Usually I find the topic quite dull to discuss as it usually maps directly to the grammar you’re familiar with with few surprises. Which is a good thing! But I think there’s a few topics of note here for GHC…

Most of the complications here seems to be for the sake of language extensions, allowing you to give hints to the optimizer, and for Haddock.


Those primarily complicates comment lexing, in that {-# ... #-} needs to be passed through via the Abstract Syntax Tree to the optimizer, whilst comments starting with |, ^, *, or $ needs to be passed through to Haddock. Or everything does for it’s hscolour utility.

But even without that e.g. --* lexes to a (dynamically-defined) symbol rather than begins a comment.


Also each newline requires the tokenizer to infer {, ;, or } tokens from the indentation & a context stack which can disable this. Or there’s a special line-based syntax for compiler macros.

According to the Haskell specification e.g. in is a keyword but not int, builtin keywords must be tokenized after they’ve been lexed as an identifier. Similar for many, but not all, symbols.

Most language extensions primarily enables more parts of this lexer. And ofcourse there’s literals!


The parser puts a fair bit of effort into incorporating those comments the lexer made sure got through. It’s still got some code dealing with language extensions, but mostly it just relies on the lexer not passing those tokens those tokens through to disable them.

Patterns and expression are parsed using the same grammar.

The Abstract Syntax Tree it yields is extremely verbose, in part to yield better error messages. And in part because GHC hasn’t figured out operator precedance yet.


Once e.g. a pattern as been parsed to as an expression there’s a typeclass to convert it into the correct AST. Which in this case might just wrap the expression with a PatBuilderOpApp because, again, GHC hasn’t yet figured out operator precedance.

Also there’s plenty of logic to keep track of line/column numbers! And postprocessing to group consecutive function declarations with the same name, whilst associating any haddock comments, & performing other syntactic validations.

Imports

For each module (after some deconstruction & validation) it loads the referenced file.

To find which file it should load from the same hackage it replaces the dots in the path to slashes, adds a configurable suffix, & looks for it in configurable dirs, labelling it with a configurable package ID. Perhaps looking for precompiled files.

To locate from a package it first looks up the module in a mapping, then filter by package name. In either case there’s caching.


Loading Ghc.Prim is special cased to be part of the compiler. Otherwise (after some pre-processing, caching, & checking for loops) it parses the module in from a binary file format. I don’t see the code for when those files don’t exist, Cabal computes a partial ordering but maybe GHC tries again later?

After which it might traverse the parsed module to have unique IDs.

Then it adds the module to the mappings for the current file.


After some additional non-conditional renaming, which involves constructing an “environment” of which “instances” are available for each typeclass & datatype, it gathers this information together for the next step.

Where it figures which declarations should actually copied over to the new environment. If there’s any alow/block-list filtering to do. Which it repeats for the sake of good error messages.

Name Resolution

  1. Dispatch parsed top-level declarations into appropriate lists in the HsGroup.
  2. Construct a mapping of operator precedance, and add their names to the identifiers mapping.
  3. Makes a half-hearted attempt to resolve the typeclass for any implementations, updates the identifiers & fixities mappings.
  4. Update typeclass mapping, exiting if any errors have previously occurred.
  5. Resolve identifiers of any “pattern synonyms” & construct a mapping of them.
  6. Use that to resolve identifiers within the pattern-matching left-hand-side of any functions, constructing a mapping of them.
  7. Resolve identifiers within the program’s datamodel (typeclasses, type families, data, type synonyms, etc), and construct a mapping.
  8. Resolve any identifiers within the right-hand-side type signature & expression, propagating down any local variables.
  9. Perform some simpler syntactic checks, like verifying there’s no dead branches.
  10. Associate operator precedance to function definitions.
  11. Issues warnings for functions marked as deprecated.
  12. Resolves optimization rules, foreign functions, annotations (a language extension?), default declarations (within typeclasses?), derived type instances, & the painful Template Haskell “splices”.
  13. Gather into a return value to pass to the typechecker.

Identifiers are deduplicated so their unique IDs can be used for lookups & inserts in those mappings.

Not only is this “renaming” vital for figuring out what the program should compile to & whether it is valid, but also this is the most useful feature in Haddock.

Type Checking

These checks to determine whether it can successfully compile the program could, and do, occur later on less verbose ASTs, but this is what gives good error messages!


The first step is to extract type signatures for type constructors & fields, typeclass methods, & pattern synonyms. Haskell requires these to have explicit signatures it can infer everything else from.

Then it iterates over each strongly-connected (i.e. uses recursion) group of functions it models the required type constraints to be checked (by a component I’ll describe tomorrow).

Usually this’ll involve choosing between different strategies based on how explicit the type signature is, etc.


Beyond traversing the AST asserting that types match up to anything explicit whilst managing an “environment” of local variables’ types, modelling the type constraints in your program does require GHC make minor corrections. Different codepaths are taken for inferred & explicit types. If there’s no recursion this is relatively easy with barely any post-processing.

If it’s directly recursive with (semi-)implicit type signature, it first sees what it can infer from the pattern matching…


For recursive functions, only once it got the pattern matching (and maybe partial type-signature) converted into type constraints can it generate a minimal formula for what types the remaining arguments must have.

Things get even more complicated for the typechecker when there’s multiple functions that are mutually-recursive, especially if you don’t give any hints as to their type. If there are type signatures, it’ll solve the formulas earlier. If not it’ll simplify them with “tau” vars.


To simplify the type inference, it computes a “partial ordering” of dependencies (computed whilst the renamers adds identifiers to the environment), which whilst it’s traversing that list is further broken down by which functions have an explicit type.

Validating type signatures

To check whether an explicit type signature is valid, it first computes it’s “rank” (computation altered by RankNTypes & ImpredicativeTypes language extensions) & extracts it’s free variables which it loads a mapping. This info is gathered (along with whether another language extension is enabled) into a single object.

Which it can use to traverse that type signature to validate (ignoring type synonyms) the number of generic args (for which the environment is altered when checking), etc.

The forall language extension loosens some of those rules. Also it checks for needed typeclasses, with language extensions this can get involved!

A more thorough traversal is then done to catch any references to explicitly-invalid types.

And (unless disabled by a language extension) it’ll then check whether type signature’s constraints will inevitably lead to ambiguous types in it’s callers, via the unifier & simplifier.

Type synonyms makes these slightly more complex.

Type matching

To start checking an actual type matches a valid (as determined by last 2 toots) type signature, it flattens the type signature. If there isn’t an explicit type to check against, it’ll generate ID’d MutVars into which the inferred type can be populated; to be read out after the callback, relevant generics merged, & converted into a standard type.

Upon failure, reported types are tidied up…

This is done immediately after determining the actual type of each function or variable.

On the caller side GHC’s typechecker uses a similar technique, expecting the expression checker to pass the type of all parameters some of which might need to be resolved later. Where unification is harder. And there’s routines for doing the same for other types of expressions.

The expression (and pattern) typechecker may also directly call the returns for tidying up computed types for display in error messages, which is where all the complexity is here. Especially when flattening args…

Simplification

Looking for the next component to discuss, there’s extensive APIs for mutating & checking the type formulas the unifier understands. There’s a monad which queues up chunks of works for later checking & abstracts those APIs, which is mostly just called by TcInteract where that work will actually get done. I’m not sure what more to say there…

But particularly tricky cases requires the type constraints to be simplified before being checked/inferred.


To simplify more complex type constraints derived from the (implicitly-typed) source code, it first tackles the “wants”.

For simple cases of wants, retrying at most a configurable number of times, it dispatches them through the worklistS mentioned previously & canonicalizes into a common format, derives info from “inert” constraints, & from top-level declarations.

For more complex cases it then considers the given “implications”, which involves similarly solving simple “givens”.


An implication has some “givens” & “wanteds” from which can be derived additional type info stored in it. It’ll then repeat this process a number of times before dropping derived constraints from the result.

It’ll then try incorporating “defaultings” possibly solving “wanteds” again, before considering overlapping typeclass instances & erroring for any unsolved constraints.

(The code for solving simple wanteds/givens is the TcDerived mentioned earlier)

Errors

After skimming over APIs the less abstract type checker components, & the unifier, uses to construct the givens/wanteds/implications solved by the simplifier (and the TcInteractive which is part of said simplifier), the next chunks of code I see is for making sure Haskell error messages actually helps the programmer.

P.S. I have to admit not fully comprehending the simplifier…

Maybe there’s an ambiguous hole in the implied types? Maybe you want to skip some checks? Something else?

deriving

In Haskell, there’s a handful of typeclasses from the standard Prelude which a standards-compliant Haskell compiler like GHC can implement on your datatypes for you, given some preconditions. This is done during typechecking in GHC!

The typeclasses & methods which can be derived according to the standard are Eq, Ord subtypeclass, Enum, Bounded, Show, & Read for comparing, inputting, outputting data. Whilst GHC adds more, including Generic for defining your own.


The first step is to extract the relevant generic type variables & the “strategy” you asked GHC to use. Then for each it typechecks the declaration, validates it’s success, then derives it’s type (which it makes sure is a valid type) according to the strategy using the type unifier & validator. The results of this are then added to the type environments.

(NOTE: “you” here is a handy pronoun to refer to Haskell devs)

Then it tackles actually generating the code for each of them!


The methods are generated specially for each mechanism, which is determined from the strategy, desired typeclass, & the looked-up (if valid for the typeclass) datatype during the previous loop.

Generating most of “derivied” instances involves little more than traversing over each of the type’s variants, and each of those variants’ fields to construct some dead obvious, yet tedious to type, code.

Cmp & Enum meanwhile considers digging into the runtime representation for ints to compare.


From there the generated code needs to be typechecked & “renamed” (identifiers resolved), which has already been applied to the explicit code. Then GHC introduces “standalone derivings” to aid it in implementing these same typeclasses, which still needs to be handled…

Typeclasses

Typeclasses a major aspect of Haskell, similar to “interfaces” in OO languages like Java or “traits” in Rust. But a little more flexible by not being tied directly to specific “objects”.

There’s a typechecking pass that iterates over each strongly-connecting grouping of typeclasses, instances, & other datatypes.

Typeclasses from those groups are processed first! After some typechecking, for class declarations (other declarations are handled here as well…) it gathers some parsed data to pass to buildClass which I’ll describe next…

A few more iterations may be required to finalize associated types, etc.


buildClass in turn has 3 steps:

  1. Iterate over the methods to convert them into fields.
  2. Create a datatype to hold these fields, which type of declaration is determined based on whether there’s only method, etc.
  3. Wrap that datatype in a new typeclass representation used by the typechecker.

These typeclasses will become explicit arguments when “desugared” to Haskell Core.

This seems to be a good explanation for how typeclasses are lowered, but what about their instances?


Lowering typeclass instances involves two steps: data gathering (including regarding the types it’s applied to) & typechecking whilst actually generating the runtime representation. The latter is it’s own pass.

Postprocesses the methods to become global functions, for the methodtables (including determining which default methods to use) to be generated during the “simplify” pass. Optimizer pragmas are generated to bypass the methodtable when able.


To determine which instances are relevant at any given expression, it maintains a mapping from deduplicated typeclass identifiers to their instances in the typechecking monad (local variables).

I think that about covers it, though maybe I should reorder these toots when publishing? I’ll put that off a little…

Typecheck/Renamer Integration

The first step (all this is surrounded by time-profiling) is to initialize all the MVars & installed plugins used during renaming & typechecking.

Then it runs a different typechecker before determining which modules to import (and whether it should import Prelude implicitly), utilizing a “backpack” for cases where the module reexports something from another module & determining the dependencies for the build system.

It may now resolve those imports.


With that populated into the typechecking/renaming environment it applies a different renamer depending on whether the current file’s a preprocessed (or manually-written to break cyclic dependencies) module file or raw source code.

Once it knows everything that’s available in the current module, it can then determine what you told it to export for other modules to use. If this is the main module of your program, it’ll expect to find a “main” function amongst those exports.


If you did write a -boot file to break links, it’ll now check everything lines up.

Before finishing off by reporting unused names, dependencies (to the build system) via imports or Template Haskell, & running any plugins.

For normal Haskell files it’ll first categorise the different declarations before traversing the AST then simplifying the resulting type formulas. Updated type information is added to the environment for later use.

Pattern Matching Decomposition

Perhaps my favourite features of Haskell is it’s expressive pattern matching syntax! I find a single line of it more compact and clear than a boolean expression in any other language.

The routine that decomposes pattern matching takes a list of variables (function arguments, etc) being matched upon & the patterns corresponding to each per branch, which it’ll tackle one var’s patterns (if any) at a time.

First it has to desugur the outermost literal syntax, the AST’s verbosity, etc mostly into normal type constructors. Name bindings becomes let expressions around the body. Whilst “strict” !-prefixed patterns are compiled to seq calls, a GHC primitive function.

NOTE You might expect seq to be implemented in terms of !patterns, but it’s actually the other way around. seq is a primitive function specially understand by GHC’s backends.


After desugaring, it categorizes the patterns scrutinizing a given variable based on whether the match anything or are a constructor, the same synonym, (numeric) literal for which there’s a few cases, strict, coercion to the same type, or equivalent view patterns.

Recurse to construct new branches for each of those patterns!

For constructors &, seperately, synonym patterns it groups patterns by constructor (sorted by runtime tag) & prepends it’s fields onto the vars being pattern matched.


After desugaring, it categorizes the patterns scrutinizing a given variable based on whether the match anything or are a constructor, the same synonym, (numeric) literal for which there’s a few cases, strict, coercion to the same type, or equivalent view patterns.

Recurse to construct new branches for each of those patterns!

For constructors &, seperately, synonym patterns it groups patterns by constructor (sorted by runtime tag) & prepends it’s fields onto the vars being pattern matched.


Pattern synonyms introduce a new expression & pattern to match out-of-line, but otherwise treated same as constructors. Matching upon strings & negative numbers is converted into an == guard (yes, I’ll describe pattern guards today, but they’re somewhat seperate), whilst positive numbers are left in the case expression. No code is generated for variables & wildcards beyond the desuguring phase introducing let expressions. Rarer forms introduce new expressions to the branch body.


But what if multiple types of patterns are applied to the same field? The typechecker should minimize this, but each of these branch-generating branches returns a value indicating whether it’s pattern matching can fail & a callback taking an argument for what to do in that case.

From there it’s a matter of iterating over the groups in source code order & chaining these callbacks together. New variables are generated to hold each fail branch to avoid duplication.

Guards

As for any “guards” you might write to add more nuance to your patterns, those are added via a small wrapper around what I’ve just described & combined like any other decomposed pattern.

These guards have the syntax | pat <- expr, ..., meaning it has to desugur the expression then (via what I’ve just described) the pattern.

If there’s no pattern it defaults to using “True”, though it takes a shortcut or two because it doesn’t need to decompose that pattern. lets are valid.

Desugaring

Alongside decomposing the patterns as discussed above, GHC lowers various forms of syntactic sugar it supports to a Core subset of Haskell. The following sections will list the lowering rules GHC applies followed by the Haskell code it calls & any commentary. For example (yes, these are real examples!):

(e)
e
exp :: Type
exp

These lowering rules are applied after optionally inserting profiling code & desugars some special type of binding (typeclass instances?), but before remaining optimizer pragmas & imported exported foreign functions.

Booleans

if cond then truthy else falsy
case cond of {True -> truthy; False -> falsy}
data Bool = False | True

infixr 3 &&
infixr 2 ||

True && x = x
False && _ = False
True || _ = True
False || x = x
not True = False
not False = True

Hardly seems worth it to have this syntactic sugar! Bools are barely anything special to GHC! Though the integration into pattern guards is nice.

Lists

[x]
x:[]
[x,xs]
x:[xs]
Or it may chose to use build
[n..]
enumFrom n
[n..m]
enumFromTo n m
[n,x..]
enumFromThen n x
[n,x..m]
enumFromThenTo n x m
class Enum a where
    succ :: a -> a
    pred :: a -> a
    -- deriving(Enum) peeks at the runtime representations to implement these methods.
    toEnum :: Int -> a
    fromEnum :: a -> Int

    enumFrom :: a -> [a]
    enumFromTo :: a -> a -> [a]
    enumFromThen :: a -> a -> [a]
    enumFromThenTo :: a -> a -> a -> [a]

    succ = toEnum . (+1) . fromEnum
    pred = toEnum . (subtract 1) . fromEnum
    enumFrom x = map toEnum [fromEnum x..]
    enumFromTo x y = map toEnum [fromEnum x..fromEnum y]
    enumFromThen x y = map toEnum [fromEnum x,fromEnum y..]
    enumFromThenTo x y z = map toEnum [fromEnum x,fromEnum y..fromEnum z]

instance Enum Int where
    toEnum = id
    fromEnum = id
    -- The actual code is more microptimized using GHC primitive functions, but this is a decent summary of it.
    enumFrom x = enumFromTo x maxint
    enumFromTo x y | x >= y = []
        | otherwise = x : enumFromTo (x+1) y
    enumFromThen x y | y > x = enumFromThenTo x y maxint
        | otherwise = enumFromThenTo x y minint
    enumFromThenTo x y z = go x
        where
            delta = y - x
            go x' | x > z - delta = [x]
                | otherwise = x : go (x + delta)

List Comprehension

Given a list L to append defaulting to [], recursively apply:

[e | ]
e : L
[e | let B, qs]
let B in [e | qs]
[e | b, qs]
if b then [e | qs] ++ L else L
[e | P <- L1, qs]
let h [] = L, h (u2:u3) = (\p -> [e | qs] ++ h u3) u2 in h L1
& something for zip

Alternatively this can compile into calls to foldr & build

infixr 5 :
data [a] = [] | a : [a]

-- GHC-specific utility that teaches the optimizer how to remove it in the presence of `foldr`, `++`, etc
build g = g (:) []

foldr k z = go
    where
        go [] = z
        go (y:ys) = y `k` go yes

-- Despite what you might expect these aren't actually used by that syntactic sugar.
map _ [] = []
map f (x:xs) = f x : map f xs
filter _ [] = []
filter p (x:xs) | p x = x : filter p xs
    | otherwise = filter p xs
[] ++ ys = ys
(x:xs) ++ ys = x : xs ++ ys

Monads

do {expr}
expr
On it's own, may issue a warning
do {start;rest}
start >> do {rest}
do {let B;rest}
let B in do {rest}
do {P <- expr;rest}
expr >>= \P -> do {rest}
infixl 1 >>=
infixl 1 >>

class Applicative m => Monad m where
    (>>=) :: m a -> (a -> m b) -> m b
    (>>) :: m a -> m b -> m b
    return :: a -> m a

    m >> k = m >>= \_ -> k
    return = pure
-- Superclasses get a bit abstract, so I'll gloss over them.

-- The `Monad` typeclass might be the magical aspect of IO for GHC's frontend.
-- To the backend `State# RealWorld` is the magical component.
newtype IO a = IO { unIO :: State# RealWorld -> (# State# RealWorld, a #) }
instance Monad IO where
    IO m >> k = IO (\s -> case m s of (# new_s, _) -> unIO k new_s)
    IO m >>= k = IO (\s -> case m s of (# new_s, a) -> unIO (k a) new_s)
    return x = IO (\s -> (# s, x #))

-- Primitive types with no runtime representation, used to avoid the optimizer messing things up.
data State# a
data RealWorld

This is mostly here to look like imparative programming.

Bindings

There’s a desugarer pass which lowers let bindings differently whether they’re variables, functions, patterns, or groups thereof whilst processing “pragmas” to guide the optimizer.

“Core Haskell” Intermediate Language

Once that desugarer has run, the remaining operations are:

These are given it’s own datamodel, seperate from what’s parsed by GHC’s frontend.


Notably this intermediate language took a step towards lowering typeclasses: They’re now explicit function arguments derived from typechecking data!

Between the desugarer & this datamodel is a bunch of builder functions to handle some more minor desugarings. Not only including those explicit type arguments, but also the if-then-else construct.

Also the fact functions can only have a single parameter (return a callback to take the next, “currying”) makes this otherwise clunky to build.


Haskell Core’s typechecker propagates mostly implicit-types, apart from those annotating variables (as determined by the main typechecker) & case pattern-matching expressions. This may involve substituting generic types as dictated by a caller & computed how many arguments a function has.

Mostly though it’s just a recursive traversal over the Intermediate Language. A lot of data is computed via a tree traversal rather than stored explicitly, are they trying to cut down on memory?


Amongst this chunk of code are functions to apply “pragmas” allowing libraries to teach GHC how better to optimize them, and to count how “large” a function (or expression or “ticks” a.ka. performance counters) is to use as a hueristic. Another counts other terms, types, coercions, value bindings, & join bindings.

There’s a traversal for applying simple “tidying” rules. Others applies substitution upon variable references.


There’s a traversal for getting rid of compile-time laziness, further hinting that GHC memory usage is a concern. There’s a routine preparing for lowering to the next Intermediate Language, which I’ll discuss later alongside that lowering.

There’s a lightweight optimizer handle trivial cases of inlining, “beta reduction”, branches upon semi-constants, & dead code elimination. There’s a traversal for finding “free variables”.

They implement some tries for this IL.


The list of free variables for an expression is used during desugaring the frontend & optimizing Haskell Core.

I’ll shortly discuss the full:

And I’ve already discussed the full desugarer above.

Let-floating Optimization

The main, unconditional, optimization pass for Core runs for a configurable number of iterations & mainly focuses on moving lets into their optimal locations. As well as applying optimization rules provided by imported libraries, but that’s a later topic. I’ll focus on this today, other passes tomorrow!

If the bindings are marked as (possibly) recursive, it’ll split them here before it can optimize them semi-individually. This may introduce inlining rules, which may also need to be simplified.


For each of these bindings it determines whether it should be inlined (is used only once, etc), or is a “join point” (truly recursive?) which needs to be treated specially.

But mostly it:

  1. Extracts any type or value arguments to the variable if it references a lambda,
  2. Saves relevant type & value substitutions within bindings.
  3. Moves those lets down the tree, making some other minor adjustments, & ensures laziness doesn’t leak the verbose memory-intensive AST.
  4. Remaining substitutions are turned back into let operations.
  5. Applies some relatively minor yet vital optimizations to the adjusted expression. Like “eta expansion”, that is moving lambdas to be contiguous. May be impacted by how “trivial” the expression appears to be, as is the case with function inlining!
  6. Determines whether it find more lets to float in the next iteration, or whether it should adjust types first.
  7. Cleans up, performing dead store elimination.

Optimizations

Common Subexpression Elimination traverses all the lets in the program/module whilst building & applying a rewrite mapping of them. It doesn’t apply CSE to a finer level, but later passes might nuke that point. Or maybe rely on equivalent optimizations in the next intermediate language.

Liberate Case inlines simple-enough recursions where there’s a case between it & the recursive call, because that often reveals further optimizations.


Floating variables towards of the leaves control flow tree reduces the conditions on which their (lazy) values are allocated, even pushing them down into function arguments. But not lambda functions. The variables being floated into a branch are kept in a list and floated down until duplication, relying on the float out to remove degenerate cases. Taking special care around primitive functions & bindings.

Float up mostly removes lets from callbacks which do not change per-call.


Another means of moving loop invariants out of loops is to examine the function’s arguments & building a inner function which doesn’t duplicate the unchanging args. By traversing the expression maintaining a list of interesting IDs sourced from the lambda’s calls, info which may require a merging step.

There’s a pass which counts up how many args each local function is passed, so it knows to optimize those functions to receive that many arguments by consolidating lambda ops together.


If a tailcall-recursive function has only a single non-recursive branch (which is very common), it can be helpful to extract that into it’s own function to make future optimizations more obvious. There’s an optimization for that!

There’s a pass to annotate the code with which variables are required in each branch in a postoder traversal, for the sake of STG optimizations. And another analysis pass to determine the strictness of each function parameter.

“Specializer” optimizations

The most central component of letting libraries teach GHC how to optimize their callers is the process for figuring out which optimization rules from a parsed/desugared list to apply. First it sees which are relevant, than it sees which are the most detailed.

In either case it’s matching a pattern against an expression or other pattern, for which it uses the same routine. Which is done via a traversal over the pattern roughly resembling an equality check but with more leniences.


There’s an optimization pass (running alongside the other Core passes) amongst the Specialization passes for determining in which recursions to unbox numeric values & use the primitive functions/types directly.

This is attempted in two passes: The first extracts usages from previous analysis, the second applies the optimization to recursive functions of a reasonable non-inlined size. With a little bit of post-processing.


But wasn’t I talking about applying library-provided optimizations!?

That’s another optimization pass applied to Core alongside all the others consisting mostly of a traversal figuring out which lets might have expressions which needs rewriting & which functions that call for which it’ll look up rewrite rules to consider.

If the rewrite rule is provided by the built-in Ghc.Prim API, these might be callbacks e.g. evaluating the math at compiletime.

Lower to STG

Once some of the higher-level optimizations are out of the way, GHC lowers Haskell Core to STG. STG is GHC’s closest-to-the-metal functional (intermediate) language.

But first GHC, after a little tidyup, serializes the module as compiled to Haskell Core to an equivalent of C header files to be read when this module’s imported. But for Haskell, and in a binary format.

This intermediate language is only used when the module’s actually being compiled…


First the GHC’s driver refactors the Haskell Core code to something more closely resembling STG.

Which it does by first iterating over all the top-level bindings to locate all the “cost centers”, which is derived from the profiling ops inserted into the code by the desugarer.

For any data constructors it generates a normal function to wrap it. Then it iterates over both those bindings & actual ones.

Both the cost centres & rewritten bindings are returned.


Iterate over top-level functions/”bindings” it’ll consider whether non-recursive ones are small enough to be inlined & thus should be lowered specially before checking whether the number of parameters matches up between the type signature & body.

As for the function bodies themselves:

  1. Numeric literals are expanded slightly
  2. Function calls are rewritten to remove currying & extract non-variable expressions into wrapping lets to be placed appropriately
  3. Apply that logic to lets

Once Haskell Core has been refactored thusly, whilst ensuring all variables are uniquely named, it’s time to do the conversion.

The iteration over top-level bindings involves some apparantly crucial validation, but most of the logic is in traversing the expressions.

Coercions are converted into function calls. Profiling, casts, cases without branches are dropped. Types are lowered into the second representation (Haskell Core shares a typesystem with Haskell).


Functions in STG takes a fixed number of arguments, there’s no language-level currying & as such the must be lowered away at this stage. Profiling ops around these function calls is preserved.

It may simplify things for the preparation phase to introduce functions in place of data constructors here, but that’s removed after this stage. Prim ops & foreign functions are handled specially.

Type, coercion (caveat), & profiling arguments are removed.

STG

GHC’s closest-to-the-metal functional (intermediate) programming language is called “STG” for “Spineless Tagless GMachine” (no, I don’t recall what that jargon means), which essentially enforces certain conventions upon Haskell Core.

It consists of the following opcodes:

The branches of a case are flagged for whether they’re a literal, default (always first), or data constructor & specifies the variables to which the dereferenced fields should be assigned.


There’s simple utilities for substituting variable references for something else.

There’s a typechecker to help assure the GHC devs their optimizations haven’t messed anything up. Which traverses the tree making sure:

And there’s a traversal to locate all the “free variables”.

STG Optimization

A driver determines when/whether to call stats, CSE, lambda-lifting, or unarise passes with optional debugging info. And there’s API’s for various conditions upon the typesystem.

The Common Subexpression Elimination pass maintains mappings of possible rewrite rules that would remove duplication. Mostly this maps from identifiers (which in STG includes function arguments), but it also seeks out cases where the value being pattern matched is reconstructed in that branch.

In the process it renames all variables to ensure the transformation doesn’t shadow anything important.

Another pass counts up the different kinds of opcodes used to report to the GHC devs.


The unarisation pass involves converting most occurances of a special “unboxed tuple” type used by GHC’s internal GHC.Prim module into multiple variables. Unboxed tuples can only be used via an altered lexer via the MagicHash language extension.


Finally there’s the Left Lambdas pass which may decide to turn any lambda variable’s context into additional variables.

This is split into analysis & transformation phases held together with a custom monad.

The analysis pass of lambda lifting classifies all variables in the expression to indirectly determine which are appropriate to be lifted. Otherwise this transformation is trivial.

Lowering to C–

If certain profiling options are enabled, it starts out spitting out C– datatables for them to count into.

Then it iterates over the top-level functions generating code for them & saving their address to a mapping to use for an internal linking step. Strings are handled differently. This code is generated differently depending on whether it has a consolt a “closure” of captured variables, which a special STG optimization pass minimizes.

Then it generates labels for datatypes…


The main component of the STG to C– lowering is a traversal over the STG tree. During which it lowers function calls to it’s own calling convention (despite C– supporting function calls for the sake of foreign functions & more complex GHC.Prim primitives) & generates inline code to allocate & initialize any “thunks” using a bump-pointer. Before each batch of which it checks if it should run the garbage-collector.


When compiling case expressions it applies certain optimizations & special cases, namely when branching upon:

Normally it’ll first decide whether to call the GC & in which register to put the value being scrutinized. Generating the branching code for primitive numbers is a simple C– switch statement.

For normal Haskell values it’ll first scrutinize a few “tag bits” in the literal before pushing the return targetS to the callstack before calling the lazily-evaluated value. Which’ll put the value’s fields into the correct registers.

All function calls are either tail calls or lazy.


Functions which directly call a dataconstructor are compiled specially to actually allocate that data. At this point all such constructors should be wrapped in a function per STG rules.

The calling convention for any Haskell function (including those constructors) depends on several factors, possibly accepting a “closure” holding any variables captured from the caller. Each of these closures contain a reference to a “data info table” used by the garbage collector.


Like data constructor functions accessor functions are also compiled specially, as are partial (“curried”) function calls & (constant) values.

Upon compiler any other local function else it starts by determining what needs to be in the closure, before generating the function itself then the code to allocate that closure around it.

Whether global or local functions start with calling convention code, followed by a possible loop to replace direct recursion, conditional GC call, & profiling.

C--

The builtin GHC.Prim module, upon which GHC implements the standard Prelude & more, is implemented in C– with very special optimization rules & type signatures. As such C– has a parser (implemented via Alex & Happy) so you can write code in it directly.

In C– your program’s full control flow graph (rather than control flow tree) is exposed, which allows for further optimization & detecting register spillage.

Debugging information can be added to the code for aggregation.

C– supports C++-like “unwinding” so calls to exit can be recoverable.

Memory layout for the callstack, closures, etc is generated when parsing or lowering to C–.

The types consists of GC’d pointers, raw bits, floats, or (for LLVM) vectors of different bitwidths.


GHC compiles basically all branches to C– switch statements, so it’s very important that C– can choose how best to represent that in machine code. Which is done via a tree of dense jump tables upon locating “holes”. Like memory layout this is a pass between the parser or lowerer & the C– datamodels, or they’re left is for LLVM to deal with.

If a jump table has less than 5 entries it’ll be replaced with conditional code. Common instanstances of this are special cased for compiler speed.


Amongst this section of the codebase there’s a “sinking” optimization which rearranges assignments, discards dead ones, & “inlines” them into registers when used only once. Which is done by taking the results of a liveness analysis & a list of assignments, then walking forwards through the graph for possible optimizations to apply.

There’s an optimization that simplifies the control flow graph (including calls/returns), which also helps make GHC itself faster.


In C– you can have the compiler manage the callstack for you, or (like in STG lowering) you can do it manually by ommitting the arguments list on a function. There’s a pass for lowering those automatically-managed callstacks.

Tailcalls have to be explicitly designated. You may directly assign to CPU registers, usual caveats apply. In-memory variables must be explicitly pushed to the stack.

After parsing/lowering control flow is unstructured.


There’s a peephole optimization pass taking care of stuff like consecutive casts & constant propagation. Maths operators, etc are now explicitly part of the language itself rather than APIs.

The typechecking pass is extremely week & not even run by default.

Upon converting to “Continuous Passing Form” it generates “continuation info tables at runtime, which I guess means “stackframes”.

When converting from structured to unstructured control flow usually degenerate cases to optimize.


There’s a pass to remove duplicate blocks of code, and another to expose CPU state to the garbage collector in a similar format to the heap itself. Profiling instructions can be added to C– code. There’s plenty of utilities for processing “labels” for unstructured control flow.

Liveness information & more a tracked using bitmasks, so code for processing those is in the C– codebase.

Register Allocation

Assigning registers is a crucial yet complex optimization for modern CPUs, avoiding the bottleneck of RAM when you don’t need to. And in lowering Haskell’s calling convention it’s already partially done!

There’s two register allocators: Linear & Graph.


Both passes share code to perform liveness analysis determining the graph they need to colour & applying the assigned registers to the C– code being compiled.

The Linear Allocator covers where there’s no loops, which occurs frequently in code compiled from Haskell because of how it combines laziness & conditions. After checking some trivial cases the Linear Allocator first tackles each control flow “block” individually before validating how the results combine.


Once it loaded the preassigned registers for each control flow block it iterates over each instruction, handling trivial cases & removes unnecessary moves. Or it finds an as-yet unused register for reads, saves data aside this instruction would clobber, updates the code, releases & clobbers newly unused registers, & does similar for writes.

Failing to allocate a new reigster to allocates a new stack slot.


Without loops register allocation is relatively easy, and it wouldn’t take much to compute a near-perfect register allocation solution. Not that that’s what that code does.

With loops it’s more complicated because this is the NP-hard Graph Colouring Algorithm. In which case it’ll retry at a most 10 times.

By first reformatting the code as a (edge map) graph incorporating any coalesced, spilled, or conflicting variables. It’ll then compute spillcosts to choose which var to store on stack.


At this point it computes a (imperfect, remember this is NP-hard) graph colouring & determines whether any further (besides what it explicitly chose to spill) registers have been spilled to stack. If so it adjusts it’s input & tries again. If not it updates the code after tidying up the spillage.

To colour the graph it first “coalesces” any movs, generates an initial colouring, coalesce any graph nodes if it didn’t previously determine this code shouldn’t yield spillage, & updates colours.


For that initial colouring it looks for all trivial cases & coalesces the graph to reveal all the others. Or it falls back to choosing a stackslot.

The second passes mostly serves to take into account the variable’s preferences regarding where it wants to be stored. Which is needed for some CPU instructions.

The coalescing mostly involves iterating over the graph for edges or nodes to remove.

Lowering to X86

GHC can also compile to other machine codes (or use LLVM’s extensive optimizations & backends), but I’ll explore just X86 in detail.

A lot of C– code is generated for each Haskell file, so GHC ensures it’s pipeline-streamed to the output whilst using minimal memory.

The GHC runtime called & statically-linked by this C– code being compiled will be described on a seperate page.


Some of this native code generation is handled generically for all platforms, e.g. determining in which order to output control flow blocks, DWARF metadata, (parameterized, as described yesterday) register allocation, & memory efficiency optimizations.

The main codegen pass iterates over each instruction in each basic block (with some validation & debugging) outputting the corresponding opcodes. These results are annotated with exception catching info.


Memory loads & stores (used e.g. in generating foreign function calls) has it’s own lowering code. As does lowering more standard calling conventions, otherwise handling foreign function calls, & lowering branches to native x86 form. Because CPUs deal in comparison results, not booleans.

Jump tables (for large switch statements) are partially handled in the generic code. But inserting it into the machine code & looking up the next instruct address is X86-specific. Similar for unwinding.


When switching from boolean conditions to the comparison conditions supported directly by CPUs, sometimes it’s useful to invert those conditions into code more directly resembling the output. For some reason this is architecture-specific code.

Some x86 instructions use specific registers, which the register allocator needs to be told about. As for how to generate the code to “spill”/load any registers it needs to.


An optimal linear layout is computed for the control flow “blocks” in the generic code, which’ll call x86-specific code to mutate those blocks’ instructions.

There’s debugging code for pretty-printing the assembly, or viewing the graph colouring applied to the registers.

The instructions are kept in C– until that main iteration I started out describing lowers it to machine code, afterwhich DWARF metadata is added.