PostScript is a language, upon which PDF is derived, for describing the vector graphics you want your printer to rasterize & print. Regardless of the hardware specs with which its printing that image. Here I’ll explore how I’d design a processor specifically for evaluating PostScript programs, and how I’d implement a rasterizer like Cairo or Rasterific upon it!
For the raw hardware I’d design something halfway-resembling The Reduceron upon which I can implement all the PostScript operators!
Executing each item in the array would pop a configured number of items from either of the sidestacks before copying its data into any of the 3 stacks. There’d be a BRANCH opcode (or would that just be CALL?) which references a table of such functions for the top of the literals stack to determine which branch is taken whilst copying referenced values to the literals stack for it to process.
That should make it easy to implement most PostScript operators…
Numbers are a huge part of PostScript! So I’d have builtin support for integers (tagged as one of 8 types) complete with builtin multiplication & division/remainder operators requiring dedicated ALUs. I’d parallelise evaluation of these whilst copying entries to the stack.
Maybe that’d be enough to make floatingpoints as fast to emulate as to hardcode…
Each stackentry would consist of 32bit integer/pointer, 4bit opcode/numeric-type, & 4bit PostScript read/write/execute permissions.
Finally I’d need to add builtin support for arrays, which according to a PostScript reference I found sometimes behaves like the functions I described but they don’t directly pop any arguments - they indirectly pop arguments. So add some operators to mutate them. At which point this really doesn’t resemble the Reduceron.
Arrays would be allocated into either side of a heap, there’d have a copying garbagecollector which kicks in as the wrong generation fills up. Include fastpath dealloc, which could involve zero-cost refcounting!
Those arrays are a standard PostScript type, and will be vital to implement rasterization!
Also I’d natively have “file IDs” for I/O (again standard PostScript standard type) which could alternatively, with a lexing coprocessor traversing a DFA to populate the initial executionstack (or executable string objects in heap) or power file operators. This DFA would have a couple adders (maybe a divider?) to parse ints & allocate name IDs. It’d be able to extend its DFA to lex names.
Above I described a hypothetical CPU design (halfway between standard PostScript & the Reduceron, with a lexing coprocessor) upon which to implement all the PostScript operators. Today I wish to describe how I’d implement the standard PostScript types upon this CPU.
As I design this software I may refine the underlying hardware, and today I will be clarifying aspects I haven’t stated explicitly.
But first, which of these have I defined to be builtin?
Types I define to be builtin:
- Integers (with 3bit typetag)
- Operators (dereferenced in own memory page, copied to heap & stacks with relative & argument references resolved)
- Packed Arrays (with optionally-seperate “header” section used internally & in-heap typetag)
Also (not standard PostScript):
- “Symbols” to be branched over
- “Branch” operator-table (or are these operators?)
All opcodes would be a read/write/execute permissions nybble, an opcode nybble, & 32bit pointer/int.
I’d used tagged ints to represent:
- Font IDs (later topic)
- Names (lexed via dynamic trie in postprocessor, converted into a perfect hashfunction for the main processor)
- Files (has own CPU-builtin operators utilizing coprocessor, compatible with packed arrays & operators)
- Save (has own CPU-builtin operators)
The builtin int operators would internally store comparison flags in the typetag, which would be hidden by the wrappers. As for real numbers…
If I speculatively evaluate maths while (and after) copying ints out of the “operator memory” it’d be just as fast to emulate floating point support representing them on the heap in my packedarray-variant. Especially if I have speculative-evaluation upon populating the stack.
The highlevel wrappers around the internal int operators would add special handling for reals & hide the comparison bitflags behind booleans!
The last division+remainder results would be cached due to lack of deduplication.
Symbols would be used to represent booleans (true & false), marks, & nulls. These can be branched over alongside the int-type bits & typefields in the heap; all in their own namespaces. Branchtables might need to be compressed via bitmasks.
Which I guess means I’d be leaving 30bits unused in this opcode; those might useful in internal code…
Booleans would mostly abstract numeric comparisons. Marks would be used in pop-until-mark operators. & null would be used to indicate silent errors.
Most “composite objects” built upon (packed) arrays would include a refcount in their header utilizing an ALU unused for those operators.
Upon packedarrays arrays add a “length” field & a slice upon the referenced array. They’d be allocated to powers-of-two & dynamically resized.
Strings would be an array of ints & be pre-lexed for the code to run upon “executing” (if permitted) this string. Not sure whether it’d be UTF32 or UTF8 encoding…
More complex are dictionaries…
Dictionaries would mostly XOR a value’s type with its integral value to get a number to divide by the dictionary’s capacity (specified upon initialization) & take the remainder to serve as an index. If the value is an String its sent through the perfect-hash-function to convert it into a Name first. If its an array, packed or not, I’d use a more complex hashfunction. This index would be doubled to store both keys & values inline.
A special dictionary would be used to hold error-handling executable-arrays (a.k.a. “procedures”) triggered by other ops.
Usually the default handlers for these error would be the
stop operator. Which would be significantly if it was made a built-in operator that clears all the memory. Ofcourse I’d still need to leave any bootstrapping code running… Special interrupts?
Btw builtin operators might be a seperate opcode from their wrappers, entirely hidden from normal PostScript code.
I’d implement loops & EXIT by defining additional symbols to push on the stack…
Continuing on my PostScript hypothetical, I’ve now defined the CPU & core datatypes. So I can commence thinking about how to implement rasterization!
To do so I first need to gather shapes & parameters how they’re stroked and/or filled! I’ll discuss the
show operators upon this state later…
For this I’d introduce another datatype to the public typesystem: a “Graphics State”! With validating fieldaccessor operators.
The fields of this graphics state would correspond to:
- Transformation matrix as a numeric array (not sure when the formula’s applied to points in the path)
- x & y position (2 fields, one accessor pair)
- Path (say more later…)
- clipping path (either resolved via a scanline intersection algorithm or a bitmask blocking certain pixels from being written)
- clipping path stack (array of those)
- colourspace array (branch over head to select parameterised colour-conversion function)
- Color interpreted via above.
- Font (next section)
- Strokeadjustment boolean determines whether to correct thin strokes.
- Device specific…
Used in strokes as I’ll discuss later:
- Numeric linewidth
- Linejoin exposed as integer (internally converted to a “symbol” for easier branching)
- Miterlimit as per linejoin
- Dashpattern numeric array & numeric dash offset
There’d be unit operators, converting to points.
Duplicate graphics state can be pushed & popped from stack.
Paths would be represented as an array whose symbol entries are interpreted differently, with subsequent numbers representing parameters to them. Post matrix-transform these would be at whatever resolution the output hardware can print (tends to be high!)…
The opcodes would be: move, line, arc, & curve. Passes over this would apply the matrix transform, remove points too close together, remove points that don’t make a perceptable difference to their line, & interpolate curves.
Text & Fonts
Text is an extremely common sight in both printed documents & onscreen! So how’d I implement rendering text with a given font upon the PostScript vector graphics API I described a hypothetical implementation of above?
There is a rabbithole here that I’ll avoid digging into until I’m exploring real projects! Regardless TrueType font-parsing would be necessary, which shouldn’t require much more than the lexer hardware.
For embedding fonts PostScript defines a standard representation.
PostScript specifies fonts are dictionaries with
FontType (int enum),
FontMatrix (numeric array), FontName (name), additional properties (dict), version number (int), writing mode (int enum), & font ID (special type). encoding (dictionary), max bbox array, unique ID (int or numeric array), some vector graphics parameters, & mappings for glyph sizes, descriptions, & more!
Characters are looked up in a Encoding array (rabbithole akin to Harfbuzz! Won’t explore here) to get the key to lookup for char sizing & shapes.
Fonts are cached, and there’s a routine to lookup fontfiles for desired fontnames & parameters. Probably use a linear-scan here, keyed in cache by name.
But past all the layout (some of which is offloaded to the PostScript program being run) & “glyph selection” standard vector paths would be looked up, tweaked to be more or less bold, positioned & concatenated, & handed off to normal rasterization.
That mostly covers the public PostScript operators, from here on it I’ll be discussing the internal rasterization implementation!
As for the remaining 4 most important PostScript operators (namely
show), now we move from the realm of APIs into the realm of rasterization!
To convert a stroke into a fill you concatenate the raw path into its reversed representation, both offset alongside their “normals” by half the linewidth in opposite directions! Add some more geometric math to handle line joins & ends.
Also you could add a fastpath converting directly to trapagons bypassing more general filling…
Dashes can be a pre-processing step iterating concurrently over the dashpattern & (cyclicly) path whilst computing to split those path components.
Fills require significant preprocessing, though the fill operation itself deserves its own thread.
The first step being applying the configured “matrix transform” as a couple multiply-sums to all the points. Allowing you to position, rotate, resize, etc these points.
Another step is to interpolate along curves to lower everything to lines.
You’d want a pass which removes points which are too close together to make a difference to the output. Another pass locates the furthest point on a subpath from the approximating straightline. If its indistinguishable that subpath is (later) deleted. If not it recursively applies the same check to both sides.
All of this requires nontrivial arithmatic, which can be incorporated into the copy-to-heap/stacks circuit. Using integral math post matrix transform can help too!
To convert these matrix-transformed simplified linepaths into high res raster graphics, The Bentley-Ottmann Algorithm (which Cairo might use some variant of or other) seems to be in order…
Bentley-Ottman starts with a minheap of LINESTART events & an empty list of lines (Cairo uses a doubly-linkedlist iterator, with special hardware an array’s probably better here) the “scanline”.
Each LINESTART or INTERSECTION it computes the position of the next INTERSECTION or LINEEND event to enqueue in the minheap along the line(s).
Each INTERSECTION it also swaps the 2 lines in the scanline & yields a trapagon to fill.
Each LINEEND yields a trapagon & removes the line.
A similar algorithm could find the intersections between the clip & fill paths to yield the actual path to fill, though there are several other techniques. This one cuts down on the sheer number of memory writes.
Superscaling comparisons (like I mentioned above) could help speed up the minheap implementation…
Filling axis-trapagons trivially lowers into pixel-ranges to fill on each row of the output image, these fills may or may not require some computation or memory lookups.
Ultimately generating the actual raster involves writing or copying a lot of memory! Especially in a highres, that’s defined by the sheer number of output pixels! There’d be a special codepath for copying images into output, for the
So make sure that this PostScript processor has wide memory busses (which it so happens to inherit from the Reduceron).
And make sure it is prefetching the top-of-stacks because I doubt a highres image would fit in its caches…
Ideally upon the
show operator I could write the generated image (write iterator running concurrently with compute) directly to the printhead spitting the ink onto paper with minimal to no postprocessing…
The writes involving compute would be for patterns/gradients.
USB has a standard all wired printers should implement. Not that they do unfortunately, I like griping that there’s no reason why I should be installing printer drivers when called on for tech support. Things should just work! (Is it printer vendors’ fault, Microsoft’s fault, or both? Its not Apple or Linux’s fault) IETF has a standard for wireless printers which I’ll discuss next.
So how would I program my hypothetical PostScript CPU to support USB printing?
If I understand the spec correctly (lots of precision, little clarity) sending a document to the printer involves catching printers attention then uploading the document chunked into USB frames.
Internally this would get concatenated then queued up (in a ringbuffer perhaps?) to be run as a PostScript program yielding images as described before to send the raw output hardware.
All of which are trivial operations to program that hypothetical PostScript CPU to do, given interrupt handlers!
The other interesting to do with USB is to allow selecting a file from a USB stick! This programming would be more complex, but once we can output highres image to the printerhead surely it’d be trivial to output a lowres image to an LCD.
From there its a matter of communicating with the USB stick, deciding what/where to draw, rendering sample pages within the UI, & responding to button presses.
Also we can send status info back to the computer upon request…
Internet Printing Protocol is a federated HTTP(S) webservice running every Mac or Linux-based machine (via the CUPS project) scheduling your documents to be printed on a nearby printer. The benefit of physical proximity in this domain helps IPP to remain federated, even when Google attempted to centralize it. So to implement on my hypothetical PostScript CPU I’d need a networking stack & webframework.
If we’re storing packets in RAM a networking doesn’t ask much from the hardware…
So have a WiFi/Ethernet hardware writing into memory & interrupting the CPU so it can process the packets in RAM. Need to add hardware to harvest entropy (I doubt there's much in the I/O to harvest…) to power pseudo-random number generation for TCP initialization & TLS encryption, which also requires RAM. Need taskswitching copying the stacks into the heap & back for TCP's sake. Not much more compute to add…
HTTP needs some actual parsing, with parsing paths being called “routing”.
The hypothetical lexer circuit I throw in to parse PostScript programs could also be programmed to lex HTTP & the binary files POSTed in IPP. Or even simplify routing.
The central IPP “operation” is “print-job” for uploading a document & its metadata. Software in the printer can then enqueue this print-job with its validated metadata outputting status information.
“print-uri” has the printer download the document rather than receiving it directly.
“validate-job” skips the enqueueing.
“create-job” registers a print job to be later populated then probably enqueue3d.
“get-printer-attributes” supplies requested metadata.
“get-jobs” responds with registered printjobs.
“pause-printer” blocks the printer from enqueueing if not stopping its current operation. “resume-printer” does the opposite. Again there's a need for task switching…
“send-document” attaches uploaded document to the registered print job.
“send-uri” has printer download a document to attach.
“cancel-job” removes a printjob from the queue & deregisters it.
“hold-job” removes a printjob from the queue but doesn't deregister it. May attach metadata for when to reenqueue it.
“release-job” unblocks a printjob from being enqueued.
“get-job-attributes” responds with the requested printjob metadata.
See IETF RFC8011 for most of the details!
Like with networking this appears to be heavily memory-bound with some compute. Little need to change the processor. Though I suspect in the realworld most IPP-compatible printers would embed a processor with just enough of an OS to run CUPS, possibly with an internal USB connection to the actual printer.