nginx is the webserver I use to deliver you webpages, including this one, which I’ll describe here.
I think my experience in self-hosting (on an old Mac Mini running Debian & nginx) illustrates a few things:
- It’s not hard at all, beyond some initial & unfortunate gatekeeping.
- Most Silicon Valley services offer less than trivial static hosting.
- A DHT as part of the web browser &/or OS would be liberating!
It’s worth remarking that nginx is a heavily optimized webserver, which in this case made it harder to follow the logic. Webservers don’t have to be this microoptimized, but it can be worth it due to how much load they often handle.
The main function initializes a custom cross-platform LibC implementing much of what I described as being in GHC’s RunTime System & starts a daemonized mainloop. Really illustrates the need for Cosmopilitan LibC, the standard POSIX one leaves alot to be desired!
This mainloop may use a choice of dynamically-modules/system calls, and may optionally use manager/worker multithreading.
As part of this initialization it lexes the configuration file(s) it dispatches declarations to callbacks in dynamically-loaded modules.
The callback for the
http blocks constructs mappings to dispatch connections to the appropriate
server block. During the final pass of which it configures the mainloop to start listening on the
http also loads & configures a whole bunch more loadable modules, which I won’t describe in detail. Maybe tomorrow I’ll list them?
Receiving HTTP requests
When a client opens connection to the server, it iterates over the
server blocks to find the write one & allocs memory before…
Logging is configured, & read/write callbacks are set depending on whether HTTPv2 &/or HTTPS is being used. Another loadable module depending on OpenSSL is used for the latter. This read handler might be immediately called.
The timeout configuration is read & added to the eventloop. A queue of reusable connections is tracked.
Upon read it first checks whether the connection has been closed or timedout, before allocating a “buffer” to read the new text into, making sure to check for errors.
At this point the read callback is configured to read parse the “request line” (unless timedout) & allocates a “request” for it to store what’s been parsed into.
Parsing that header involves checking for known methods entering a state machine, returning whether to try again on the next packet after possibly enlarging the buffer. This might give it enough info to choose a
server block whether by hashtable or regex.
Next callback similarly parses HTTP request headers into a list…
Responding with HTTP responses
Once it sees a blank line it validates it has received all the expected &, for HTTPS, that the right TLS certificate has been used before actually “processing” the request. Upon receiving any more data it errors out.
All the different declarations inside the
server blocks are parsed into callbacks dispatched into the appropriate “phase” of request handling, to be called at this point in the appropriate order.
Now how’s the response sent? I’m now looking at
The phases into which declarations like
try_files are bucketed into are:
- REWRITE & POST_REWRITE
- PREACCESS, ACCESS, & POST_ACCESS
- PRECONTENT, & CONTENT
During which it writes any necessary bytes out the socket to the client.
These declarations are fully parsed before being run, including compiling variables to array indexes. Callbacks are used to save variables for known HTTP headers.