UNIX Networking

Perhaps the most useful feature of modern computers is the ability for them to communicate with other computers around the world, and the humans on the otherhand! This page documents some of the most core software projects bringing this feature to life!

INetUtils

INetUtils provide a suite of more-or-less trivial networking clients & servers, several but not all of which are outdated & effectively deprecated. A few are indispensible debugging tools!

FTP

FTP is a widely-known protocol for transferring files between a client & its a server. I believe its past its hayday now, but that’s not to say its gone or outdated.

After initializing internationalization & with the aid of a support-library parsing commandline flags, the client ftp checks some system parameters, possibly registers some interrupt handlers and/or runs a “setpeer” command, & repeatedly prompts for a command. For each command read via readline or getline it discards empty lines (quitting upon EOF), does some minor parsing, & dispatches to the appropriate command with error reporting.

Most of the FTP-client logic is implemented in those REPL commands, with “$” being implemented in its own file with some integration into authentication. Which sets one of 16 entries defined during login to a parsed command with macros substituted in. Also there’s (unless otherwise stated, defers to server-side command once validated):

These are built upon utilities for parsing args (including globs) or otherwise communicating with people over text. To handle the FTP protocol itself, there are:


After initializing timezones + process-title & parsing commandline flags (via shared utils) validating no args remain the server ftpd opens a logfile piping its stderr to /dev/null, either daemonizes or determines its caller, configures interrupt handlers, configures the socket connected on stdin, logs the connection, outputs a welcome message & optionally a version number, & Bison-parses the requests. The Bison-parser calls routines for each command, with those routines being syscall-bindings implemented alongside the main function. Building upon supporting utils for outputting to the main socket, a side socket, or the logfile. For remote yet authenticated clients to trigger those syscalls on the machine running ftpd.

Additional support files implement a wrapper around a choice of authentication backend, with PAM’s ugliness encapsulated in its own file. Other support files:

IfConfig

INetUtils manages a command ifconfig for listing or configuring network hardware-connections/”interfaces”.

After parsing commandline flags & identifiers in remaining args (defaulting to all interfaces) ifconfig opens an INET STREAM socket & iterates over the specified interfaces before cleaning up. For each interface it might use one IOCTL or another to convert the interface name to an index & outputs its name. Otherwise it checks which other actions its been asked to perform.

Ping

To test whether a given machine can be reached over the internet, INetUtils provides the amusingly-named “ping” command. And a minor variation “ping6” for the IPv6 protocol.

After initializing internationalization & parsing commandline flags ping initializes a raw INET socket with ICMP’s port & a buffer, then configures the socket (broadcast, countdown/interval ICMP field, other caller-exposed options, time-to-live, & type-of-service) & stdout (enable buffering). With that initialized socket & a newly-allocated buffer, ping runs the commandline-flags-specified callback.

At the core of any of those callbacks is a routine which encodes the desired ICMP message (using a supporting library I’ll describe later) & transmits it to the given destination, & keeps doing so with fresh timestamps checking for responses (via select() syscall) to decode triggering appropriate callback as well as timeouts until Ctrl-C. These callbacks can be one of:

Talk IM

INetUtils’ “talk” is a rudimentary instant-messaging protocol, I mean you might have considered IRC rudimentary…

After initializing internationalization & parsing commandlineflags with a couple additional args the client talk opens a couple sockets, outputs a message (either to plain or via NCurses & configures a periodic interrupt to keep doing so, sends/receives data before attempting to connect to the server it specifies, & failing that waits for someone to connect to us before periodically sending/receiving invites.

Once we’ve got that connection it initializes a NCurses display + associated signal handlers followed by an editing subwindow, before notifying that we’ve established the connection & entering the mainloop. This mainloop uses select() syscall to check both stdin & the socket for input with error-handling, any socket data is sent to NCurses adjusting the previous display. Whilst anything from stdin is additionally sent via the socket.


To find other users to connect to, talk connects to the server talkd. After parsing commandline flags & initializing logging talkd parses an optional access-control-list syslogging any errors, retrieves its own hostname or crashes, configures a watchdog timer, & repeatedly reads a request from stdin with error-handling from which to compute a response to send.

Computing that response involves initializing some standard fields, normalizing/verifying request, & branching over its type.

This lookuptable is a doubly-linkedlist of timestamped requests that’s linearly-scanned.

The access-control-list (in addition to per-user ones) are interpreted for each validated request to find any entries specifying whether to allow or deny access. For debugging, requests & responses may be syslogged as text.

Telnet

To access the terminal on a remote computer we now typically use SSH. But today I’m stduying the client for its predecessor protocol, TELNET, as implemented by INetUtils. I smell the history here!

After initializing various things (internationalization, the terminal & 2 ringbuffers for it, 2 ringbuffers for networking, the TELNET protocol uploading envvars & deferring to a supporting library, stdout/stderr, & compile-time maybe the TN3270 IBM terminal, a Cray escapes table, terminal state, & prompt string) & parsing flags the client telnet reconstructs additional commandline arguments & parses them again as the open subcommand to open the corresponding socket (seems a little Rube Goldbergian…), before entering a mainloop with a bit of special handling for TN3270.

Each iteration of that mainloop carefully prompts you for a command to parse & dispatch. Many of those commands include subtables to dispatch through & calls into a protocol layer. Spending significantly more effort parsing, reporting, & replying to responses (upon “open” commands via select() syscall) then providing those commands utilities for sending control codes to the server.

There’s some linker-hooks called by the underlying LibTelnet. Underlying which there’s abstractions for ringbuffered networking, with an underlying ringbuffer implementation also used elsewhere. There’s a ringbuffered terminal abstraction, with special handling for BSDs or IBM 3270 (I’m surprised not Linux). As well as light abstractions over other standard library calls adding human-legible logging.


After initializing logging & parsing commandline flags validating no args remain the server telnetd carefully looks up the clients’ hostname & its own, reconfigures the socket, initializes authenticators and/or encryptors followed by I/O globals, envvars, a subshell, configures interrupt handlers, populates some default config, sends some initial messages (some conditional), waits for initial requests possibly replying. After the first few requests telnetd expects it interprets the rest of the requests in that packet, messages the client to show which machine its connected to, retrieves subshell config, reconfigures the subshell as the client requested whilst reporting back to said client, & enters the mainloop.

This mainloop wraps the select() syscall when to shuttle data between the socket & the subshell; before cleaning up. That mainloop (and, as mentioned, immediately before it) repeatedly calls a function to interpret the client’s requests. This consists primarily of a looped switch statement, many branches of which consults a will/do/don’t/won’t statemachine whose state-transitions are sent & received over that socket. But usually it defers to operations on the connection to the subshell.

Many of those subshell ops are triggered upon the mainloop receiving output from it, updating that statemachine. Furthermore there’s a suite of globals & numerous associated utility-functions to abstract buffered reading/writing on both the socket & subshell, as well as debugging (plenty to serialize here) & syslog output. Including some help with the encoding. Also there’s a pre-processor pass over the input from the socket before shuttling it to the subshell. Some of these utilities are instead called by LibTelnet, which I’ll discuss after the networking shell commands.

Whois

When you need to debug or otherwise-investigate the public-ownership of DNS records (the identities we rent online, e.g. adrian.geek.nz or floss.social), INetUtils’ whois is the tool for you!

After initializing internationalization & parsing commandline flags with an allocation stack & validating (except in special circumstances) additional args remain moving them to that allocation stack whois might open a socket to “whois.internic.net” sending it a trivial “=” query printing the response exitting upon success. Otherwise it has several conditions for choosing a default DNS server (supported by plenty of dataheaders compile from datafiles via Perlscripts), construct a less-trivial textual query, registers interrupt handlers ensuring we clean up the socket, opens a socket to the chosen DNS server, sends the query, lightly processes responses following referrals, & echoes them to stdout.

Trivial Networking tools

Having studied a bunch of simple networking utils from INetUtils, there’s a suite of even more trivial ones!

dnsdomainname is truly trivial! After parsing commandline flags it calls gethostname & getaddrinfo outputting the result.

After parsing commandline flags hostname might do several things, fitting in 2 templates. Either it runs the selected action, postprocesses (often via gethostbyname) as specified by flags, & outputs results.

Or hostname might read input from an arg or given file, validate it has input, & run the selected action. The actions which can be selected for those templates include the gethostname, getdomainname, & sethostname syscalls.

After parsing parsing commandline flags logger defaults the tag to the current user (USER envvar or getpwid(getuid()) syscall ) , carefully opens a socket whether local or remote, & retrieves input from args or stdin to reformat & send over that socket.

After parsing commandline flags rcp determines where to connect via getservbyname handling Kerberos/Shishi specially, retrieves the user, sets up a send or receive, performs some normalization/validation of args, constructs the remote command to run, configures SIGPIPE interrupt handler, parses remote path, & actually sends or receives the file whether remotely or not. If remote it iterates over args running some remote command or other with Kerberos and/or Shishi. Or rcp runs local commands. Sending files opening them checking their type possibly-recursively; whilst possibly last-modified times, reading network responses, writing file metadata to the socket, & copying the file in chunks to the socket. Receiving a file involves validating request & network responses, communicating file metadata, recurse for directories, changing destination file metadata, copying from the socket to the file in chunks, & some final tweaks.

After parsing & validating commandline flags rexec might prompt for a password, before unconditionally carefully openning a socket to the desired host, possibly opening its own socket for the server to connect to & use as stderr, writes the user, password, & command to the socket, & enters a mainloop shuttling data between the socket & stdin/stdout/stderr.

As for the serverside… After parsing commandline flags validating no more args remain & opening a logfile rexecd determines who connected, configures interrupt handlers, opens a tty handle, logs who connected, parses which port of the client to connect to (if any) as stderr, connects to it, reads username/password/command from socket, checks those credentials authenticating as them possibly partially-via PAM, forks a subprocess to shuttle stderr to the stderr socket, configures environment variables, & switches to running the specified command.


After parsing commandline flags & an additional arg handling missing usernames rlogin might check which port to use for Kerberos, checks which port to use for login, looks up the terminal being used & its speed to serialize to text, retrieves windowsize, register SIGPIPE handler, initializes event-polling data, possibly performs some Kerberos or Shishi checks via their APIs, reports errors, configures socket, & starts a reading thread & a writing thread with cleanup. The writing thread shuttles data from stdin to stdout with outdated encryption & manual echoing The reader copies data from the socket to stdout with outdated encryption. On various interrupts terminal info is fetched to be uploaded, amongst other interrupt handlers. What I thought was preparing the mainloop was actually setting up these handlers.

After parsing commandline flags validating no args remain & opening a logfile the corresponding server rlogind ignores SIGHUP interrupts, looks up its own hostname, possibly daemonizes (opening/configuring a socket to listen on, with its own outer mainloop), & enters a mainloop. The mainloop (running separate threads if daemonized) retrieves the clients name & numerical address, logs the connection, reconfigures socket, sets a watchdog timer, sets a nil byte, performs various authentication checks (possibly including Kerberos/Shishi/PAM), opens a terminal in new thread, executes configured login shell in that thread, starts the loop proper, & cleans up. All whilst reporting errors. This loop proper shuttles data between the socket & the subshell with special encodings for control chars.

After parsing commandline flags & an additional arg capturing remaining args rsh looks up relevant ports possibly initializing Kerberos, uploads the operation with or without outdated encryption, diagnoses/reports any errors or uploads credentials, drops its own privileges, configures interrupt handlers, closes the socket early or forks a subprocess, reads & writes some initial data with or without outdated encryption, enters a mainloop shuttling between the terminal & outdated socket, & tidies up.

After parsing/validating commandline flags validating none remain & opening a logfile rshd determines who connected to us, possibly reconfigures the socket syslogging failures, configures interrupt handlers, puts more effort into determining who connected to us, checks any options set on the socket to clear them, validates client port number, carefully reads data from client & connects to that port for stderr, maps the sockets to stdin/stdout/stderr, checks client address again, initializes current outdated-encryption, reads credentials, reconfigures encryption based on that, validates the data received so far, authenticates, opens some pipes, forks a subprocesses possibly (for PAM) twice running the given users’ loginshell as them, & before tidying enters mainloop which shuttles between the encrypted socket & the subshell.

After parsing commandline flags & possibly daemonizing syslogd retrieves its own hostname & IP, configures interrupt handlers, allocates mainloop data, opens Linux’s more-limited error-reporting channel (which they might use to debug the less limited ones) to treat as a client, upons a UNIX socket, possibly a network socket, opens a pidfile, kills its parent, & enters a mainloop. All whilst conditionally-reporting errors. The mainloop reads from the appropriate client copying a lightly reformatted message to the appropriate logfile, openning new ones when appropriate. Periodically closing old logs. Also amongst its initialization (don’t know how I missed this) syslogd closes all open log files, parses specified configuration file & directory thereof, possibly outputting results. This can be retriggered by an interrupt via the mainloop. Another interrupt toggles whether syslogd’s own debug output is shown.


After initializing internationalization & parsing commandline flags tftp (literally named “trivial”) chooses which port to connect to whilst registering an interrupt handler & initializer a “mode” global before running a REPL referring to a lookuptable of callbacks. There’s other REPL commands in tftp but (after parsing its args) “put” opens the specified file, configures a timer & timeout, & with both a timer and timeout repeatedly: Constructs a request (whether with initial metadata or the next chunk of the file) to send as a UDP packet validating a response each time. “get” works similarly. Most other tftp REPL commands are accessors for globals used by those.

After parsing commandline flags copyring remainder into a directories-array & opening a logfile the corresponding server tftpd reconfigures the socket, listens for a packet, & forks to return the socket to inetd. tftpd’s subprocess then repleatedly listens for client data, opens a socket to them, sandboxes itself, validates the opcode in the request, normalizes & validates the specified filename, logs the request, & branches off to send or receive callbacks. To send a file it repeatedly with a timeout: Reads file chunk to send over the socket validating each corresponding response. Receiving works similarly, responding with success/error codes. All whilst syslogging.

After initializing internationalization & parsing commandline flags traceroute validates a hostname was specified & resolves it (with a little preprocessing) twice to get the IP and canonical name, outputs this info, opens & configures a raw or UDP socket, repeatedly sends “SUPERMAN” or mangled text whilst checking for replies to decode with some help of a shared library outputting results incrementing a couple counters (including hops counter on the socket) as needed.

For yet another remote shell protocol… After parsing commandline flags & reconfiguring its environment uucpd queries who sent the request & forks a subprocess to return the socket to inetd. The subprocess reads credential info from the socket, authenticates, & runs the given command as the given user leaving stdin/stdout/stderr connected to the socket. Very trivial, but lacking in features!

To offload some of the serverside boilerplate we have the inetd daemon, which runs all these other servers hence why the see a socket for stdin! After parsing commandline flags copying remainings to a config-array & opening a logfile the possibly-daemonized inetd attempts to grab a pidfile, parses those configfiles recursing over directories (plenty of code, syslogged errors), configures interrupt handlers, sets a dummy envvar, & enters a mainloop. This mainloop uses the select() syscall to determine server-socket a client has connected to. Such connections are associated to the corresponding service, logged to stderr and/or the syslog, accepted, possibly sets envvars, possibly forks a subprocess with blocked interrupts, & ether runs (for truly-trivial debug protocols) a callback or (as a specified user, carefully syslogged) command. The server-sockets are opened & configured as it parses the corresponding configuration entries.

INetUtils Supporting Libraries

I’ll skim some supporting libraries used by the various INetUtils commands, starting with relatively-brief synopsis of the one making up for the limits of the not-quite-extensive-enough C standard library!

Including:

But mostly INetUtils’ support library lightly wraps sycalls & other standard APIs (like memory de/allocation, memory copying, text formatting, & querying system databases) with additional error handling (crashing programs for rare errors). If its not just reexporting the existing APIs. Often with Windows adaptors.

Within this general shared library there’s some sublibraries. One wraps the PThread locking APIs with some one-time initialization & (where needed) additional error handling, whilst building up a read-write lock upon its primitives. Meanwhile a “malloc” sublibrary implements Dynamic Array & “Scratch Buffer” datastructures.

LibICMP

INetUtils’ LibICMP aids encoding & decoding ICMP requests/responses.

LibICMP primarily defines a struct, the bulk of which consists of a union, declaring https://datatracker.ietf.org/doc/html/rfc792 (RFC792) in a way GCC understands.

With routines adding timestamps & 1’s-Complement checksums to those request, and for checking those checksums.

LibINetUtils

Aside from the general extended-LibC library I described recently, INetUtils’ includes a support library more specific to it.

Including:

LibLS

Abstracting upon directory listings as exposed by Linux, INetUtils’ “libls” is used by some of their server daemons. This includes:

LibTelnet

INetUtils’ Telnet client & server both share an underlying library handling the encoding of Telnet messages. Which I’ll describe today!

This includes:

Linux Networking

Linux includes a networking stack (commonly used on servers, the “routers” between you and them, & not infrequently especially for my presumed audience on your client machines) which I’ll explore over the next few days. Starting socket syscall used to initialize it.

With some validation throughout & excessive function wrappers this involves performing any configured security checks, allocating the memory setting credentials, looking up the appropriate methodtable under lock, ensuring the corresponding kernel module is loaded, calling the initialization method, managing refcounts, performing some final security checks, & allocates a wrapping file-descriptor with a custom wrapping method-table for the userspace to access. The (non-seekable) file-descriptor methodtable do little more than wrap the more specialized socket methodtable. Adding locking, checks including access-control, & excessive layers of function calls as is typical for Linux. The interesting bit is that Linux there’s methods for piping another filedescriptor into the socket, since sending files over the internet is extremely common & we don’t want to bottleneck bridging over to userspace…

There’s a routine for registering a initializer-methodtable into the socket-type lookuptable, under lock.

UNIX Sockets

Creating a UNIX socket involves setting the appropriate methodtable (each of which includes a finalizer) for the subtype whilst validating/allocating/initializing various other fields.

There’s variations of this methodtable for datagrams; these connect() with different validation checks, can’t be accept()ed, & performs the main I/O tasks differently. I’m finding exactly how it differs a bit illegible. And the sequential is a bit of mix between the “stream” & datagram methodtables, with its own read/write methods.

INET Sockets

Linux’s constructor for INET sockets is a bit more involved than their UNIX sockets. This involves some validation, a linear-scan over a lookup table to determine which kernel-module to load under a lock. Also there’s all the allocations & setting default property values (some of which comes from that lookuptable, whilst others depending on various conditions) as is typical for a constructor. Protocol-specific hashing & initialization methods might be called.

That lookuptable had some preprocessing I needed to dig through so that constructor can use a smallintmap to quickly down on the overhead. These INET sockets all provide 2nd-level wrappers doing little more than dispatching to the true methodtable, with some additional locking & checks. Including Berkely Packet Filter checks. ioctl()s may be dispatched to other components as well, very relevant to INetUtils!

Linux’s INET sockets exposes several IOCTL syscalls accessing networking configuration. The socket-subtype may refine these, but this morning I’m studying the general INET ones. After checking privileges SIOCADDRT & SIOCDELRT under lock validates its arguments, fills in a configuration entry from parameters (with some conditions) including looking up a device from a user-space identifier calling its getter-methods to get more fields. Before releasing the lock SIOCDELRT looks up the corresponding entry in the hashap/trie & removes it whilst SIOCADDRT allocates a new routing table & inserts it. SIOCRTMSG is unsupported. After performing various checks & copying data from userspace SIOCDARP, SIOCSARP, & SIOCGARP looks up the identified device before deleting, setting, or getting the corresponding hashtable entry. Results are validated & copied back to userspace.

SIOCGIFADDR, SIOCGIFBRDADDR, SIOCGIFNETMASK, SIOCGIFDSTADDR, & SIOCGIFPFLAGS all use the same userspace struct to communicate parameters to/from kernelspace. After saving the original address aside & looking up the identified device these prepares an address struct (except PFLAGS), looks up subdevice if specified, & retrieves the appropriate field. SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFNETMASK, SIOCSIFDSTADDR, SIOCSIFPFLAGS, & SIOCSIFFLAGS also use the same struct as input, just not output. And they reuse essentially the same routine; differences being that it validates the address structure (and credentials) rather than prepares & that it (carefully) sets attributes rather than gets them. Whilst updating mappings. I’m receiving mixed messages from the code regarding PFLAGS.

TCP Sockets

TCP implements a reliable streaming protocol that can be dispatched to userspace processes by “port” numbers stored in its header. Today I’m studying Linux’s implementation of it!

There’ll be plenty of methods to go over… Including:

socketpair() is unsupported here. getname() is an accessor for the IP/port we’ve connected to. One wrapper-method registers a noop methodtable for memory-mapping TCP sockets. I’m not clear what this accomplishes… A few methods directly operates on the TCP incoming queue.

There’s also a couple backend methodtables TCP refers to. One constructs & enqueues IP packets on the configured device, compute TCP checksum, reconstruct various header fields, conditionally-set counter & response-socket fields, catalog & respond to new connections, catalogue successfully-synced connections, access IP or Berkely Packet Filter or NetFilter options, get destination port/address, or adjust MTU/MSS fields. A final TCP methodtable consults hashmaps using the outdated MD5 hashfunction. Could be fine for this use, just can’t rely on it for security.

UDP Sockets

For internet apps which need to trade off reliability to gain more performance there’s the UDP protocol, though if you attempt to reimplement TCP’s reliability upon UDP that can easily backfire! Here I’m summarizing Linux’s implementation of it.

ICMP Sockets

ICMP is a protocol for network debugging, so I’m studying Linux’s kernel-space implementation!

Its full methodtable includes both initializers & multiple finalizers with group-allocation. Berkely Packet Filter pre-checks. Establishing a connections involves little more than for IP. Disconnection is as per UDP. ICMP has none of its own options to expose accessors for, only IP’s. ICMP headers are uilt in userspace, with extensive kernel-space code connecting to sockets with various checks. Various properties are conditionally-set alongside portnumbers to handle bind() syscalls. I’m not clear on what’s special about enqueueing packets. This ICMP implementation much like the corresponding UDP implementation. And it shares a wrapper methodtable with IP, which strongly resembles UDP’s.

Basically: Same as IP, with additional checks & privileges.

IP Sockets

The Internet Protocol (IP, not be confused with Intellectual Property or Indigenous People; latter’s easier to differentiate based on context) serves to tell “routers” where to direct your packets. These routers run Linux with a barely-present userspace for (auto)configuration. Today I’ll study Linux’s widely-used implementation of IPv4.

Linux’s methodtable for this includes:

And yesterday I briefly described the wrapper methodtable.