The parser itself is actually quite simple, there’s not too much for it to deal with. But converting many of these constructs to “character ranges” proves to be trickier, thereby outputting a flattened tree of “pattern terms”.
The interpreter extracts “disjunctions” (control flow branches) from these “pattern terms” and will interpret them using an explicit stack whilst jumping between code for forward execution and backtracking.
For each kind of pattern term it has a method to interpret it, check if it occurs the minimum of times, and allow it to match up to the maximum of times no matter whether it’s “greedy” or not. The latter logic is repeated for each one, but looks near identical everywhere.
The JIT works in much the same as the interpreter, except it outputs machine code to do it’s real work with specific registers hardcoded to specific tasks. Also:
- Tests on character ranges are compiled into a “binary search”, where the middlemost ranges are checked first.
- It generates a bytecode as an intermediate language to the machine code, largely dealing with disjunctions.
- Backtracking logic is generated after forward execution, most of which jumps to forward execution.