Most languages now have a nice webframework like Django or Ruby on Rails to aid implementing serverside webapps where the HTTP responses are dynamically generated. As opposed to clientside webframeworks like Elm.
Haskell has two main webframeworks: Happstack & Yesod. Happstack is used by Haskell’s package manager Hackage, and I will be using it for a Rhapsode-compatible webpage debugger! It’s the one I’ll discuss here…
At it’s core, Happstack is a webserver that runs a callback (in a new userspace thread) for each parsed request to get a response to serialize.
In more detail, it infinite loops (despite any exceptions, which may get logged)
accept()ing new connections on the configured port, starting a new greenthread foreach with a timeout configured.
Parsing requests mostly involves splitting by newlines until emptyline, the first of which is split by words to get method, URI, & optionally version.
Header lines are split by “:”, with any subsequent lines starting with whitespace appended, using Parsec. Postprocessing converts the key to lowercase & concatenates those extra lines before converting them into a list of
HeaderPairs with dedup’d keys.
It retrieves the
content-length header to determine how to read the body, and parses the
Cookie header. If successfully parsed these results are assembled with some MVars into a
Request object & your callback is called.
The HTTP response from that callback is optionally logged (using configurable callback), serialized to the client, tmp files from the
rqInputsBody MVar are cleaned up, & it decides whether to await the next request on this socket.
There are three main cases for HTTP responses: whether to use the
sendfile() syscall to offload work onto the kernel, & whether or not to “chunk” the response body. In either case sending the HTTP version, status code, & (postprocessed) headers is the same.
Request -> Response callback, which unlike nginx doesn’t trust you to manually serialize, is all Hoogle (no not a typo) wants.
But for a nicer API Happstack converts this into a Parsec-like monadic API, using a large pile of 3rd party generic wrapping monads presumably to encourage GHC to optimize better.
And it uses typeclasses to convert the input text into typesafe values, and the (common) output type into
Response with a
content-type header & Lazy
The way Happstack uses monads in it’s API is by having the functions for retrieving values out of the query parameters, cookies, the next path component (which are queued), etc fail over to the next alternative handler (specified via
<|> monadic operations) when they fail to convert into the expected type. If they all fail it has built in 404 page to respond with.
Other APIs enqueue callbacks to alter the
Response. And yet other APIs form common abstractions around the others.
Those wrapper APIs implements HTTP standards regarding localization, authentication, compression, etc. And the results they compute will be configurably validated before (or perhaps, due to laziness, during) serialization.
The same team provides other APIs which are arguably, mostly through use of Template Haskell, even more typesafe. Especially for webforms & URL routing.
Next I will be discussing the major HTML templating library, which Happstack knows how to convert into a
Blaze HTML Templating
To render your serverside webapp’s data into human-readable HTML files, you’ll typically use some sort of “templating language”. There’s some different aesthetics here, but Blaze allows you to construct the HTML tree for output using a DSL-like syntax.
It advertises itself on speed, but most of the optimizations have by now been upstreamed into libs I’ve already discussed which makes things easier.
These optimizations are for serializing output, other modules are much better at processing XML trees.
Monads are used in this declarative API to allow you to list child elements using an indentation, rather than operator-based, syntax via the
do syntactic sugar. Since you’re probably indenting anyways for legibility, though you can still use semicolons or
The elements themselves are represented by function calls, attributes are attached via the
! infix operator, and with
OverloadedString text can be written as normal Haskell literals.
The core optimization challenge Blaze faces is to make text concatenation rapid, which starts by minimizing how much you actually do! So the constructor functions for tags like includes the preceding
< in all the datatypes it knows how to output to.
And those strings are actually compiled to a function which takes the next String to concatenate on in place of the list terminator. GHC is great at turning this into a constant-time operation!
From there it’s a matter of traversing the tree to perform those concatenations.
Non-literal text is stored in the datatype (String, Text, or ByteString) it arrived in for the tree traversal to convert into it’s desired type. Literal text is compiled into all 3.
A secondary optimization challenge is to microoptimize escaping special HTML chars to &entities.
Append operations in the output tree, whilst the
! operator wraps the left-hand-side tag in a
AddAttribute node which’ll concatenate together text to be inserted into it.
“Acid State” NoSQL database
The way Acid State works is it lets you, the caller, define the datastructure it stores as well as the operations which can be performed upon it. Much like you’d do for the MVar API which it, in part, wraps.
At a surface-level Acid State provides methodtables for how to query & update the database live or upon startup, and others for how to (de)serialize the data to disk before & after converting them into “Entry” bytestrings.
The default Serializer calls Cereal. Then the Archiver takes that list of bytestrings & writes them out in sequence with added per-entry lengths & CRCs to ensure it’s read correctly.
There’s an Acid State implementation which works purely in-memory, wrapping MVar. Another communicates with an external server in case you need to scale.
And the main implementation extends the one wrapping an MVar to write each operation to disk via a Serializer & Archiver. Periodically it’ll write a “checkpoint” storing the current database state, so it doesn’t need to reconstruct all the data from scratch.
This allows you to define a datatype & functions operating upon it, and bundle them together into a Atomic, Consistant, Isolated, & Durable database. You may also specify a serializer.
makeAcidicWithSerialiser via Template Haskell (the
$( ... ) syntax) generates the code for that at compiletime.
Once the constructor & type arguments are extracted from the typename passed to
makeAcidic[WithSerialiser], it iterates over the passed events.
A type constructor will be generated for each of those event functions if it’s not already present, alongside implementations of two relevant typeclasses for it. Plus whatever serializer you configure it to use, defaulting to Cereal/SafeCopy.
Then it generates an instance of
IsAcidic for mapping those new datatypes to the functions they correspond. Some more magic is involved here that I don’t understand…
At a top-level this is primarily a typeclass implemented for a variety of common types, providing methods to
get data. Including the
Generic typeclass, allowing you to trivially implement
Serialize for your own types via
Lists are stored on disk (or in solidstate storage) as a length-prefixed array, to which several other types are lowered. To optimize for common cases where numbers are small, they’re stored as arrays of bytes unless they only require a single one. Floating point numbers are not compressed.
If a type has multiple variants (“sum types”) like
Either it’ll get prefixed with a byte indicating whether it is e.g.
Right, storing any of that variant’s fields afterwords.
The serializers are written using a
Monadic wrapper around Bytestring Builders. To read that data back in it implements it’s own Parser Combinators (which I have done as well, it’s not scarry if you only address your own needs).
It’s parser combinators take success & failure callbacks, alongside the input bytes, an optional buffer, & whether we’ve already read everything to return success, failure, or the need for more data. Also a number for some reason…
“SafeCopy” data migration
Over time you’ll probably want to change the datamodels you’re persisting to disk in your webapp. So the Acid State uses a module called SafeCopy to allow for this.
In a nutshell it extends Cereal as I described yesterday to persist a datamodel version number alongside the serialized data, in order to determine when to run a provided typeclass to “migrate” the data from an older datamodel to the new one.
The central component of SafeCopy is a
SafeCopy typeclass, specifying the type’s:
- version number & “profile”
- whether it’s a primitive (unversioned), base, extends, or extended type.
- how to read & write the data, possibly generated via
deriving Genericor Cereal
- cached consistency check (enforcing unique version numbers & correct use of
- & name of the type for error messages
A global mapping stores how to read these datamodels from disk.
To find the right reader, it loops until it finds a type with the correct version number as stored ondisk from which it can retrieve it’s
getCopy method. Otherwise it checks it’s “kind”.
Base, & possibly
Extended kinds throws errors.
Extends traverses to figuring out how to read the referenced type, before calling the type’s
And unless disabled
Extended leads it to consider both the forward & backwards “getters”, disabling this option in the recursion.
safeGet functions are implemented upon the same monads that are provided by Cereal as described yesterday.
Primitives are specialcased to not read/write a version field before the data itself.
SafeCopy is implemented for several common types used in Haskell programs.
FileLock - used for transactions
There’s really not much to it, it’s a nice Haskell wrapper around the
flock() (Unix) or
lockFileEx() (Windows) system call with custom types. It’s an exclusive lock, not a read/write lock which is probably implemented elsewhere.
btw Corporate Scale databases like PostgreSQL take a more nuanced view of “linearity” for performance’s sake, and that’s where most of their complexity comes from. PostgreSQL for example tracks in which transaction each tablerow was added & removed, for invalid rows to be discarded during postprocessing or deleted during “vacuuming”.
Aeson - JSON (de)Serializer
I plan to use this to allow you to debug webpages in (close to) Rhapsode & Haphaestus using Selenium & it’s WebDriver JSON/HTTP protocol. Though it’s debatable which page of my site this page belongs on.
To start Aeson models JSON as:
type Object = HashMap Text Value type Array = Vector Value data Value = Object !Object | Array !Array | String !Text | Number !Scientific | Bool !Bool | Null
vector (array slices) &
unordered-containers (hash array mapped tries) hackages.
Serializing this data is done via Bytestring Builders, with microoptimizations for keywords, to be wrapped in an
Encoding object for some typesafety reason.
Deserialization is done via Attoparsec, with microoptimizations for parsing strings possibly even dropping down into C. And other text encoding microoptimizations requiring UTF-8.
scientific hackage is used to avoid the full cost of parsing when we’re just going to write it back out.
To make reading/writing especially objects nicer & more typesafe Aeson provides typeclasses for converting between that AST and your own types. Instances of these typeclasses can be automatically generated using either Template Haskell or
FromJSON instances uses a
JSONPath type as infrastructure to extract the relevant keys & report errors for anything missing.