Fetching Pages on Hypothetical Device

Given the hardware established previously, who would we program it to send & receive network requests? What Operating System infrastructure might be needed for that?

Here we’re focusing on HTTP & HTTPS (but not versions 3 or greater) for relative concision, support for other protocols & URI schemes is left as an exercise for the reader.

Requesting a URL

I’d start requesting a URL by parsing white-space-separated relative URLs into registers & the datastack (interpreting “../” with a pop) to be concatenated into an absolute URL.

Zine listing of common HTTP headers

Zine annotating the structure of a HTTP request

This absolute URL would be parsed by a cache, in-memory parser/trie, falling back to dispatching based upon prefix (URI scheme). The HTTP(S) handler would combine the reformatted URL with locally & globally configured headers (implementing content negotiation, user agent identification, & more), etc. This assembled HTTP request would be sent to a cache of tasks (“sockets”) keyed by protocol+host so we can reuse those sockets & avoid overhead.

Process-flow diagram illustrating TLS negotiation

We may prepend data (TLS) negotiating random numbers alongside system parameters to introduce some privacy to this network connection using encryption.

ASCII-art diagram illustrating the structure of a TCP packet

Regardless we negotiate random numbers (TCP) to add reliability, using timeouts & counters.

Zine showing annotated DNS request & response

From here we we parse the named destination to indicate where intermediaries (“routers”, typically running Linux with a barely-present userspace) should forward this data to. With naming’s own cache, usually they’re “parsed” via blocking DNS requests to query remote servers for the name. The response may redirect us to another DNS server, but that’s usually handled for us by the configured (typically via DHCP upon connecting to the network) DNS servers we’re sending the request to. Wrapped in UDP & IP headers to a pre-configured destination typically set upon connecting to the network by DHCP responses.

ASCII-art diagram illustrating the structure of a IPv4 frame

ASCII-art diagram illustrating the structure of a IPv6 frame

Once we’ve got the destination IP address we prepend IP headers (whether v4 or v6) leaving most blank for intermediary “routers” to use.

To figure out which machine to send the packet to on the local network we send a cached & blocking ARP request, falling back to a DHCP-configured gateway.

That allows us to prepend a WiFi header leaving many of the fields blank to hardwired live processing. Maybe it encrypts the body, if configured to do so upon connecting to the WiFi hotspot. From there it’d output to a queue of WiFi packets to be sent when there’s a gap in the conversation, requesting silence for longer packets & repeating after a random interval when we’re talked over. Adds a CRC using a minor tweak on a shift register so receivers know they’ve received the packet correctly.

WiFi Response

Now that I’ve described how I’d send a network request, I’ll start discussing what to do with the response.

Block diagram of the circuitry involved in sending & receiving WiFi packets, from OpenWiFi

The WiFi receiver would decode radio transmissions into a ringbuffer, with live processing deciding whether to queue an ACK directly to the transmitter buffer & whether to notify the Parsing Unit to handle this frame. Checks the CRC.

Whether for WiFi (which may need to be decrypted via the Arithmetic Core or possibly other hardware), IP (discarding packets not routed to us), UDP, or TCP the Parsing Unit dispatches the body to be parsed according to a port number in the header. The countdown & shift registers in the input preprocessor would be handy for this parsing.


TCP additionally needs to track counters exposed in its header using the Arithmetic Core. Packets arriving out of order would be discarded. Recieved bitflags would increment & decrement the number of packets sent simultaneously.

ASCII-art statemachine diagram of how TCP should behave

Additionally TCP has nontrivial requirements for the scheduler. Received responses are pushed, possibly with priority depending on a bitflag. It pulls request data to send with acknowledgements (that is, an updated counter in TCP header). And timeouts are set to resend packets, or give additional opportunity for request data to be sent with the ack.


Next in our pipeline TCP may dispatch to a TLS implementation, which HTTPS may have wrapped around HTTP.

TLS would have the Parsing Unit split the stream into sized, typed, & encrypted “records”. The Arithmetic Unit would decrypt them. The Output Unit would dispatch those records & negotiate which cipher to use alongside it’s randomized parameters.

TLS includes application (dispatched to caller), negotiation, switch-cipher, & error/warning records. Then there’s authorization…


In the pipeline we’re tracing TCP would dispatch by port number to HTTP, with or without wrapping it in TLS.

Zine annotating an HTTP response

HTTP is where our Parsing Unit really starts becoming useful! Until then the IETF ensured even a CPU as trivial as our Arithmetic Core can easily handle their protocols. It’s header would be parsed into an error status & key-value trie.

This response may then be loaded into the URL cache to intercept the next request for this URL, or replace with a HEAD request. The Cookie request header for this domain may be updated (I prefer to limit to webform POST to strictly enforce GDPR). Also upon 3xx status codes I’d lookup the Location header to defer to a new HTTP request, unless we’ve done so too many times in a row.

Zine illustrating HTTP redirect process

Most importantly though a pipeline to handle the response body would be constructed out of compression, charset, & MIMEtype headers. Falling back to sniffing, aided by most charsets being ASCII supersets.

Compression would typically parse the preceding “codebook” into a bitwise-parser for the rest of the stream, and/or use the Lempel-Ziv hardware. Text transcoding would use the Parsing Unit aided by non-ASCII charclass (traditionally) with or (commonly) without lookuptables outputting, lets say, UTF-8. As for the Content-Type header, I’ll discuss XML/HTML handlers on the next page. But I’m sure you’ll enjoy imagining alternate codepaths!

Support Infrastructure

The process of sending those requests does require some basic operating system facilities, which will be explored here.

Arithmetic Core Usage

In sending or receiving a network request as I described yesterday, we need timeouts, cryptography, randomness, counters, & all the redundant 1’s Complement checksums I didn’t mention.

That checksum & the counters don’t need to be further explored here. The clock would be it’s own counter periodically incremented even when the computers’ otherwise off. From this timeout events could be dispatched via the Parse Unit to the Output Unit as I’ll explore later.

As for randomness & encryption…


Block diagram illustrating process for Salsa encryption cipher

Cryptography (or “crypto”, guess what the “crypto” in “cryptocurrency” stands for?) takes “trapdoors functions” from mathematics, Number Theory specifically, which can be trivially computed but near-impossible to reverse. From which we assemble useful formulas for securing network communications. These may (“encryption”) or may not (“hashing”) be reversible. If they are you may need the same (“symmetric”) or related (“assymetric”) key to “decrypt” it. Upon hashing we build random number generators, or combine with encryption to create “signatures”.

XOR is a handy operator, it preserves any randomness in it’s input! With equal probability of having outputted a 1 or 0 for an unknown input or 2!

Except if you’ve ever solved a cryptogram puzzle you’ll know this isn’t enough. For it not to be vulnerable to frequency analysis we need to scramble the bits! So add bit-rotates to shuffle the bits & additions to merge the bits in ways that are reversible. These “Add-Rotate-XOR” (ARX) cyphers have been becoming popular!


As for randomness, we cannot program a deterministic machine to produce random numbers. Any process it can follow can be used to predict the randomness. What we can do is approximate, extrapolate, mix, & estimate randomness, & adjust their probability distributions.

Block diagram illustrating process for ChaCha20 PRNG/hash function

So where does randomness come from?

Also if we don’t have sufficient entropy (which we can low-bound estimate based on how much we trust the different sources), the timing of that can be another source of entropy! It’s harmless to mix in non-random sources as long as we mark them as untrusted regarding our estimate of the amount of randomness.

What I’ve just described reportedly strongly resembles how Linux actually implements this!


TCP & hardware drivers incurs non-trivial demands on our hypothetical device’s scheduler. Inputs may need to be parsed right-now, outputs may need to be slowed down to be perciptable to other hardware or humans.

Responses would be pushed to whichever parser is associated with the port number (also input may say “parse this right now!”). Looking it by port number in a collection triggering the stacks to be replaced, before sending it the packet’s body. We may need to incur priority scheduling, using multiple input queues.

Requests are pulled after receiving acks, exposed as parsing desired number of bytes. Converting a push pipeline to a pull pipeline would buffer the relatively-compressed output opcodes with minor tweaks (Parsing Unit would be free at that time, use it for rewriting?) to be evaluated later. Wrapping it in a parser setting the limit.

Output opcodes would be freed as they’re evaluated. Pushing data into a pull pipeline enqueues additional Output opcodes. The Arithmetic Core would need to save it’s state (say, into the remaining opcodes?) if it’s the one to hit the limit.


Then sometimes we need to schedule operations to happen sometime in the future, after a delay. Say because a packet hasn’t been acknowledged & as such needs to be resent.

To support this I’d hook a capacitor (or quartz crystal) up to a latch to turn the saw-wave into a square wave whilst periodically draining the capacitor to produce said saw wave. Each edge triggers the processor to evaluate the next operation. A counter reduces the ticks to a more manageable rate. These ticks interrupt the Arithmetic Core, even when the device is otherwise off, to increment a 64bit counter (avoiding y2038 issue) & compare the lowest 16bits against the next timeout, outputting an event if so. Which the Parsing Unit would dispatch to the Output Unit freeing the opcodes, before everything considers whether there’s more timeouts to process & recomputes the dispatch table.

To enqueue a new timeout event I’d insertion-sort (later topic) the new Output Unit opcodes into place & parse that Output code into a dispatch table. With the reformatting code appended to the end.


How’d we turn on our hypothetical auditory web browsing device?

The powerbutton (joystick?) would interrupt the always-on (for timekeeping’s sake) Arithmetic Core (AC). The AC would hand off (with some data as per hardware architecture, say something describing the hardware we’re running on?) to the Parsing Unit & on to motherboard-specific code in the Output Unit. Once it’s sent test output & event handlers received test input after a (overwrite clock interrupt in AC to support this) timeout it’d hand off to a bootscript starting daemons & last-visited page. Or homepage.

Alongside that Power-On-Self-Test I may have a rudimentary UI for hardware configuration & more thorough testing. However without booting the whole OS (otherwise, what’s the point of this stage?) I/O would be limited to joystick, beeps, & prerecorded audio (ideally I’d hire Micheala Swee). Or we offload early-boot I/O over USB onto another computer? Or to a telephone answering machine? Or we document it using a website or audio CD? Or a combination of these?

The bootscript, and the rest of the OS, would be built upon a linker. This would parse commented machine code in reference to a rewritable filepath-parser/trie in the SSDs (the filesystem!) to grant read/write/execute permissions to specified files/directories. Adding abstractions where it helps. It’d also grant access to shared RAM or (rather, at a hardware-level, revoke) output peripherals. This in-depth access control should help contain most malware without incurring extra overhead, & enable (running each time we load a parser) hotswapped updates!

Keeping things simple, one of these daemons would upon periodic timeout download a handful (to keep each download small) of URLs, vitally, with caching. Parsing responses from, say, Zip to queryable directories mounted in the filesystem. Then notify the user offering technical details.

With the software I’ve described on this page this central server could easily run on the same hardware as the client! As long as we’re happy sticking with key-value stores, SQL requires more programming.


Putting the effort into sorting data often makes it easier to process, whether for humans or computers. For example putting HTML attributes in a consistent order aids extracting subsets of them. Or there’s selector specificity I’ll discuss tomorrow, as well as timeouts… The simplist way to implement it on the established hypothetical hardware would be Insertion Sort! For each unsorted item (Output Unit) iterate over the sorted list (Parsing Unit) to splice it into the correct spot (Output Unit) as determined by the Arithmetic Core!

GIF Animation of some bars being Insertion-Sorted by height

Except iterating over every sorted item for each unsorted one is O(n^2) inefficient! We can use pull pipelines to inject some laziness into the concatenation, so we can stop the sub-iterations once we find the place to splice the new item in. Bringing it down to an optimal average O(nlogn). For more efficiency include logic to concatenate contiguous Output Unit opcodes where possible. And address degenerate, yet common, cases by capturing a sorted prefix before sorting!

In summary:

The Input Preprocessor would lazily concatenate the sorted list under direction from the Parsing Unit. The Parsing Unit iterates over sorted & unsorted items, consulting the Arithmetic Core (via a special connection included just for this purpose) to determine where to splice. The Output Unit triggers the sub-iterations & actually splices the new item into place. The Arithmetic Core does comparison.

My entire hypothetical processor would be kept quite busy to perform any sorting…

Images mostly sourced from Julia Evans or screenshotted from IETF specs. Some images are from Wikipedia. Click an image for more details.