Odysseus "Prosody" templating

While usually I focus on describing code written by other people that I call, today I’m tempted to describe how some of my own software works.

For Odysseus I implemented my own very powerful templating engine to help me in building it’s internal pages, which presents information stored in a local SQLite database. Or maybe even from the Web.

To start interpreting one of the template files describing an internal page, Odysseus has a manually-written two pass lexer. The first looks for some seperator outside of quotes, and the second extends that to parse lex {% ... %}, {{ ... }}, and {# ... #} as “template tags”.

After that the parser dispatches most of it’s work to hashtable-lookedup classes named by {% ... %} tags. Or {{ ... }} has it’s own class. These may reentrantly recurse back into the parser for “blocks”.

Those parsers yield an abstract syntax tree that is interpreted through dynamic method calls. While theoeretically that’s the slow approach, in practice all the problems were caused by poor memory use - the main one of which I fixed upstream in Vala.

They write their output into my wrapper around GIO’s OutputStreams which silences any errors, often copying text straight from the source code into the output. GIO’s GResource and GLib’s GBytes are a huge help with that.

The main task of this language is to move data into place (possibly via the SQLite DB), which is what the “variable tags” are for. These may be constructed by {{ ... }} (in which case it does autoescaping) or as part of other template tags.

The unquoted-delimiter splitter described previously is a huge help here, for seperating variable paths and (hashtable looked-up) filters.

These operate on a trait implicitly casting between lists, maps, numbers, and text.

That datamodel serves very nicely to abstract away the differences between a range of untyped inputs, from GValue, json-glib, libxml2, SQLite, etc. And there’s also a couple of implementations used to construct the input for a template or subblock:

Testing

Testing is done by rendering out an HTML page I leave open as a tab in my browser session, with a special templating tag thoroughly exercising the template parser and providing a means to test it’s interpretation. This tag outputs it’s results in a coloured <details> HTML tag and counts up the fraction of passing tests in a per-outputstream global.

With another special templating tag outputting those variables and rendering out a new favicon/title.

Next I’ll describe my diffing algorithm.

To help me determine what is failing, I reimplemented a diffing algorithm off Wikipedia (their CompSci is excellent) I learnt about from university.

It’s constructs a 2D matrix tracking the most in-sequence characters each pair of prefixes for the two inputs can have. Then I can read that table from the bottom right to find a most favourable “longest common subsequence”, which with a tiny bit of processing and outputting formatting can become a diff.

The git and diff commands use this algorithm.

Internationalization

For internationalization, the runtime is trivial. I just need to parse a localization “catalogue” file (chosen based on an environment variable) and lookup translations by key. I reuse the same lexer, and I cache all in-use translations using weak references for this.

This is triggered by a {% trans %} block template tag, which compiles down into a {% with %} tag.

What’s harder is creating those translation files, and that requires some development tools.

Creating these files is essentially a two step process:

  1. Find all the translatable text in the templates and write them out to a file for translations.
  2. Associate translations with each of those translatable texts.

For this I created a Python3 program using a regular expression to lex my template files, and the standard lib to walk the directories. Since I’m keeping the syntax basically the same this was trivial.

But that isn’t enough as localizers need to be able to keep their existing translations when pulling in new translatable texts from the “base file” generated by the last script.

But since, again, I’m sticking with the same basic syntax: I could reuse the same code to work with it. From there I parse the base file into a hashmap, and then update it with the localization file, before writing it out.

I also created two more scripts, to guide entering translations and to count up the progress.

Database access

As for database access, I have a template tag which compiles it’s subblock into a SQLite “prepared statement”. This is a bit of a compilation process, translating the AST into raw text (taken from the text nodes) and an array of variables. After which it has SQLite compile that text into it’s own bytecode.

To allow for macros, this can involve chaining two variable tags into one.

Then at runtime it evaluates all those variables and passes the results to the prepared statements input.

{% if %} tags

Of all the parsers I’ve seen, I personally find those dealing in terms of “operators” (rather than “grammar rules”) most interesting.

The simplest of these is the classic “shuffling yard” algorithm, which uses a stack to move operators from being in infix form to postfix based on some predefined operator precedance values. Basically it just pops higher precedances before pushing lower ones.

From there it can be trivially interpreted using another stack.

But if you want something in more in between the typical grammar rules parser and shuffling yard parsers, there’s “Top Down Operator Precedance” parsers. It’s like Shuffling Yard but uses recursion and indirect function calls.

Those indirect function calls allow the operators to define their own grammar rules, even if usually they just call back into the reentrant “driver” method for infix and prefix operators.

It is most readily expressed in a dynamic programming language like Python.

TDOP’s driver method only needs to do two things:

  1. Tell the first token to parse itself as a prefix or literal expression.
  2. Tell all subsequent tokens to parse themselves as infix or postfix expressions while they still have higher precedance then specified, storing the previously parsed expression in it.

Recursing back into this from those methods does the rest.

This algorithm is used to parse {% if %} tags in my templating language, as it is in Django’s.