Haskell Webdev via Happstack

Most languages now have a nice webframework like Django or Ruby on Rails to aid implementing serverside webapps where the HTTP responses are dynamically generated. As opposed to clientside webframeworks like Elm.

Haskell has two main webframeworks: Happstack & Yesod. Happstack is used by Haskell’s package manager Hackage, and I will be using it for a Rhapsode-compatible webpage debugger! It’s the one I’ll discuss here…

Happstack Server

At it’s core, Happstack is a webserver that runs a callback (in a new userspace thread) for each parsed request to get a response to serialize.

In more detail, it infinite loops (despite any exceptions, which may get logged) accept()ing new connections on the configured port, starting a new greenthread foreach with a timeout configured.

Parsing requests mostly involves splitting by newlines until emptyline, the first of which is split by words to get method, URI, & optionally version.

Header lines are split by “:”, with any subsequent lines starting with whitespace appended, using Parsec. Postprocessing converts the key to lowercase & concatenates those extra lines before converting them into a list of HeaderPairs with dedup’d keys.

It retrieves the content-length header to determine how to read the body, and parses the Cookie header. If successfully parsed these results are assembled with some MVars into a Request object & your callback is called.

The HTTP response from that callback is optionally logged (using configurable callback), serialized to the client, tmp files from the rqInputsBody MVar are cleaned up, & it decides whether to await the next request on this socket.

There are three main cases for HTTP responses: whether to use the sendfile() syscall to offload work onto the kernel, & whether or not to “chunk” the response body. In either case sending the HTTP version, status code, & (postprocessed) headers is the same.

Happstack API

Providing a Request -> Response callback, which unlike nginx doesn’t trust you to manually serialize, is all Hoogle (no not a typo) wants.

But for a nicer API Happstack converts this into a Parsec-like monadic API, using a large pile of 3rd party generic wrapping monads presumably to encourage GHC to optimize better.

And it uses typeclasses to convert the input text into typesafe values, and the (common) output type into Response with a content-type header & Lazy Bytestring body.

The way Happstack uses monads in it’s API is by having the functions for retrieving values out of the query parameters, cookies, the next path component (which are queued), etc fail over to the next alternative handler (specified via msum or <|> monadic operations) when they fail to convert into the expected type. If they all fail it has built in 404 page to respond with.

Other APIs enqueue callbacks to alter the Response. And yet other APIs form common abstractions around the others.

Those wrapper APIs implements HTTP standards regarding localization, authentication, compression, etc. And the results they compute will be configurably validated before (or perhaps, due to laziness, during) serialization.

The same team provides other APIs which are arguably, mostly through use of Template Haskell, even more typesafe. Especially for webforms & URL routing.

Next I will be discussing the major HTML templating library, which Happstack knows how to convert into a Response.

Blaze HTML Templating

To render your serverside webapp’s data into human-readable HTML files, you’ll typically use some sort of “templating language”. There’s some different aesthetics here, but Blaze allows you to construct the HTML tree for output using a DSL-like syntax.

It advertises itself on speed, but most of the optimizations have by now been upstreamed into libs I’ve already discussed which makes things easier.

These optimizations are for serializing output, other modules are much better at processing XML trees.

Monads are used in this declarative API to allow you to list child elements using an indentation, rather than operator-based, syntax via the do syntactic sugar. Since you’re probably indenting anyways for legibility, though you can still use semicolons or >>

The elements themselves are represented by function calls, attributes are attached via the ! infix operator, and with OverloadedString text can be written as normal Haskell literals.

The core optimization challenge Blaze faces is to make text concatenation rapid, which starts by minimizing how much you actually do! So the constructor functions for tags like includes the preceding < in all the datatypes it knows how to output to.

And those strings are actually compiled to a function which takes the next String to concatenate on in place of the list terminator. GHC is great at turning this into a constant-time operation!

From there it’s a matter of traversing the tree to perform those concatenations.

Non-literal text is stored in the datatype (String, Text, or ByteString) it arrived in for the tree traversal to convert into it’s desired type. Literal text is compiled into all 3.

A secondary optimization challenge is to microoptimize escaping special HTML chars to &entities.

Monads construct Append operations in the output tree, whilst the ! operator wraps the left-hand-side tag in a AddAttribute node which’ll concatenate together text to be inserted into it.

“Acid State” NoSQL database

The way Acid State works is it lets you, the caller, define the datastructure it stores as well as the operations which can be performed upon it. Much like you’d do for the MVar API which it, in part, wraps.

At a surface-level Acid State provides methodtables for how to query & update the database live or upon startup, and others for how to (de)serialize the data to disk before & after converting them into “Entry” bytestrings.

The default Serializer calls Cereal. Then the Archiver takes that list of bytestrings & writes them out in sequence with added per-entry lengths & CRCs to ensure it’s read correctly.

There’s an Acid State implementation which works purely in-memory, wrapping MVar. Another communicates with an external server in case you need to scale.

And the main implementation extends the one wrapping an MVar to write each operation to disk via a Serializer & Archiver. Periodically it’ll write a “checkpoint” storing the current database state, so it doesn’t need to reconstruct all the data from scratch.

Syntactic Sugar

This allows you to define a datatype & functions operating upon it, and bundle them together into a Atomic, Consistant, Isolated, & Durable database. You may also specify a serializer.

Calling makeAcidic or makeAcidicWithSerialiser via Template Haskell (the $( ... ) syntax) generates the code for that at compiletime.

Once the constructor & type arguments are extracted from the typename passed to makeAcidic[WithSerialiser], it iterates over the passed events.

A type constructor will be generated for each of those event functions if it’s not already present, alongside implementations of two relevant typeclasses for it. Plus whatever serializer you configure it to use, defaulting to Cereal/SafeCopy.

Then it generates an instance of IsAcidic for mapping those new datatypes to the functions they correspond. Some more magic is involved here that I don’t understand…

“Cereal” (de)Serialization

At a top-level this is primarily a typeclass implemented for a variety of common types, providing methods to put & get data. Including the Generic typeclass, allowing you to trivially implement Serialize for your own types via deriving Generic.

Lists are stored on disk (or in solidstate storage) as a length-prefixed array, to which several other types are lowered. To optimize for common cases where numbers are small, they’re stored as arrays of bytes unless they only require a single one. Floating point numbers are not compressed.

If a type has multiple variants (“sum types”) like Maybe or Either it’ll get prefixed with a byte indicating whether it is e.g. Left or Right, storing any of that variant’s fields afterwords.

The serializers are written using a Monadic wrapper around Bytestring Builders. To read that data back in it implements it’s own Parser Combinators (which I have done as well, it’s not scarry if you only address your own needs).

It’s parser combinators take success & failure callbacks, alongside the input bytes, an optional buffer, & whether we’ve already read everything to return success, failure, or the need for more data. Also a number for some reason…

“SafeCopy” data migration

Over time you’ll probably want to change the datamodels you’re persisting to disk in your webapp. So the Acid State uses a module called SafeCopy to allow for this.

In a nutshell it extends Cereal as I described yesterday to persist a datamodel version number alongside the serialized data, in order to determine when to run a provided typeclass to “migrate” the data from an older datamodel to the new one.

The central component of SafeCopy is a SafeCopy typeclass, specifying the type’s:

A global mapping stores how to read these datamodels from disk.

To find the right reader, it loops until it finds a type with the correct version number as stored ondisk from which it can retrieve it’s getCopy method. Otherwise it checks it’s “kind”.

Primitive, Base, & possibly Extended kinds throws errors. Extends traverses to figuring out how to read the referenced type, before calling the type’s migrate method.

And unless disabled Extended leads it to consider both the forward & backwards “getters”, disabling this option in the recursion.

These safePut & safeGet functions are implemented upon the same monads that are provided by Cereal as described yesterday. Primitives are specialcased to not read/write a version field before the data itself.

SafeCopy is implemented for several common types used in Haskell programs.

FileLock - used for transactions

There’s really not much to it, it’s a nice Haskell wrapper around the flock() (Unix) or lockFileEx() (Windows) system call with custom types. It’s an exclusive lock, not a read/write lock which is probably implemented elsewhere.

btw Corporate Scale databases like PostgreSQL take a more nuanced view of “linearity” for performance’s sake, and that’s where most of their complexity comes from. PostgreSQL for example tracks in which transaction each tablerow was added & removed, for invalid rows to be discarded during postprocessing or deleted during “vacuuming”.

Aeson - JSON (de)Serializer

I plan to use this to allow you to debug webpages in (close to) Rhapsode & Haphaestus using Selenium & it’s WebDriver JSON/HTTP protocol. Though it’s debatable which page of my site this page belongs on.

To start Aeson models JSON as:

    type Object = HashMap Text Value
    type Array = Vector Value
    data Value = Object !Object | Array !Array
        | String !Text | Number !Scientific
        | Bool !Bool | Null

Using the vector (array slices) & unordered-containers (hash array mapped tries) hackages.

Serializing this data is done via Bytestring Builders, with microoptimizations for keywords, to be wrapped in an Encoding object for some typesafety reason.

Deserialization is done via Attoparsec, with microoptimizations for parsing strings possibly even dropping down into C. And other text encoding microoptimizations requiring UTF-8.

The scientific hackage is used to avoid the full cost of parsing when we’re just going to write it back out.

To make reading/writing especially objects nicer & more typesafe Aeson provides typeclasses for converting between that AST and your own types. Instances of these typeclasses can be automatically generated using either Template Haskell or deriving Generic.

The derived FromJSON instances uses a JSONPath type as infrastructure to extract the relevant keys & report errors for anything missing.