System Commands

There’s a large suite of commands split accross a handful of projects which do little more than call into the Linux kernel via “systemcalls”. This page documents how each of these work under the hood.

GNU CoreUtils

GNU CoreUtils provides the majority of the most-used commands on the commanline. GNU CoreUtils’ commands are largely mutually independant, with a tiny bit of shared code. There is some overlap with Bash’s builtin commands, in which case those take precedance.

All these commands, and more, that I’m describing can be compiled into a single coreutils executable with the entrypoint comparing argv[0] to all commandnames known to coreutils.h & calls the corresponding function. Otherwise parses some commandline longflags to determine the function to call, or perform typical initialization (with –help also listing those commandnames) to possibly output an error.

Yes, you don’t need to switch to BusyBox for this!


yes (after initializing internationalization, registering an exit callback to safely fclose standardout, & parsing common GNU commandline flags namely --help & --version; yes there’s a minor shared library, mostly for normalizing UNIXes) adds “y” to the commandline args if missing, computes memory size needed for space-concatenating (unparsing) those commandline args then allocates with a minsize (duplicating text if too small) & does so.

Then infinite loops writing that to stdout!

whoami after typical initialization & validating no additional args are given, wraps geteuid syscall (a getter for a property of the current process) & getpwuid.

uptime after typical initialization retrieves the optional commandline argument & reads that utmp file (later topic) before retrieving system’s uptime in seconds via kernel specific facilities (on Linux /proc/uptime) & manually converts that into a human-readable & localized value to write to printf.

unlink after typical initialization retrieves it’s single argument & wraps the unlink systemcall (defered to an OO-method on the appropriate filesystem).

unexpand, on more standard UNIXes & after mostly-typical initialization parsing commandline flags itself with no other commandline args, optionally wraps uname outputting specified fields of it space-seperated, & optionally makes a couple sysinfo calls to output CPU & hardware names.

tty, after mostly-typical initialization, parses -s, -h, & -v flags itself validating there’s no further commandline arguments either wraps isatty outputting via error code or ttyname outputting via stdout depending on presence of -s.

truncate, after initialization, parses & validates commandline flags, stats & opens the first (non-optional) commandline arg, & for each of the rest temporarily opens it, computes missing flags via fstat & maybe fseek before wrapping ftruncate.

true & false optionally runs the typical initialization if there’s only two arguments, with their main purpose being to immediately exit with a specific errorcode.

touch, after mostly-typical initialization, parses & validates commandline args possibly lstating a reference file or calling gettime syscall to fill in missing fields, optionally parses the given date, before iterating over commandline args wrapping fopen and optionally fdutimensat.

tee, after mostly-typical init, parses flags, disables specified signal handlers, configures binary for stdin & stdout with sequential access optimizations, opens all the args into a newly-allocated array, repeatedly reads blocks from stdin it to all of them, & once all outputs or the input closes it closes any remaining files.

sync, after mostly-typical init, parses & validates flags, for each arg temp-opens that file, fcntl(F_GETFL)s it, & wraps f[data]sync or syncfs.

sleep, after typical initialization, parses each commandline arg as doubles + units summing the results to pass to the xnanosleep syscall.

runcon (simple sandboxing), after mostly-typical initialization, repeatedly parses & validates flags, calls into SELinux to construct a new context possibly based on the current one, validates & saves that context, & execvps the trailing arguments.

rmdir wraps that syscall maybe ignoring certain errors & maybe iterating over parents.

realpath, after mostly-typical initialization, parses commandline flags validating there’s no further commandline args, normalizes the base path in different modes, & for each arg normalizes it, maybe finds common prefix with base to reformat into a relative path, & outputs it.

pwd, after mostly-typical initialization, parses commandline flags to determine whether to return normalized $PWD or getcwd with fallback concatenating all parent dirs from “.”.

readlink, after mostly-typical initialization, parses commandline flags validating argcount based on -n, & for each arg outputs the result of a readlink syscall wrapper reallocating larger buffer on fail or of a hashmap-aided common utility iterating over the link normalizing it whilst readlink any intermediate links realloc’ing as needed.

printenv, after mostly-typical initialization, parses commandline flags & iterates over libC’s environ to output all or specified envvars.

printf, after mostly-typical initialization, checks for –help or –version flags performing their special tasks, validates there’s at least 1 commandline argument, & interprets that first argument akin to C’s standard lib concatenating the given text with appropriate escaping, unescaping, & validation.

nproc, after mostly-typical initialization, parses commandline flags before checking OpenMP envvars before consulting traditional kernel-specific syscalls (now getaffinity’s standard).

nohup, after typical initialization, checks $POSIXLY_CORRECT to determine which error code to use, validates there are args, checks whether we’re running in the terminal, possibly NULLs out stdin, opens a new stdout/stderr, disables SIGHUP signals, & execvp’s it’s remaining args.

nice, after mostly-typical initialization, parses flags with special path, parses offset if given, wraps nice or setpriority, & execvps remaining args. If there are none another codepath outputs niceness.

mktemp, after mostly-typical initialization, parses it’s commandline flags, validates it’s commandline args possibly providing defaults, concatenates on a given suffix and/or containing directory, & calls gen_tempname_len with appropriate bitflags. Which GNU CoreUtils reimplements themselves in case it’s not available in LibC.

mknod, after mostly-typical initialization, parses & validates it’s commandline flags including SELinux/SMACK context, & wraps makedev + mknod or mkfifo.

mkfifo, after mostly-typical initialization, parses commandline flags including SELinux or SMACK context, possibly calls mode_compile/umask/mode_adjust, & for each commandline arg error-handled mkfifo possibly with preceding SELinux defaultcon.

logname, after typical initialization, validates there are no commandline args wraps getlogin & puts result.

link, after typical initialization, validates there’s at exactly 2 commandline args & wraps link syscall.

mkdir parses flags including SMACK/SELinux context, validates there are args, possibly calls mode_compile/adjust & umask, & for each arg temp-registers the security context whilst calling a thoroughly error-handled wrapper around mkdir syscall. With or without a callback which mkdirs any missing parents.

kill parses commandline flags to either kill syscall each arg as pid or serialize each arg as signal with or without number. Or same for all signals.

id, after mostly typical initialization, parses & validates commandline flags, & for each commandline arg parses it as a user+group (an int or two) to wrap getpwuid & print specified properties. If there were no commandline args it instead retrieves it’s own user/group to print the same specified properties.

hostname, after typical initialization, wraps sethostname if single commandline arg or gethostname if there’s none. Otherwise complains.

hostid wraps gethostid.

groups, after typical initialization, wraps getpwnam for each arg. With getpwuid, getgroups, and/or getgrgid calls to help output it’s result. Or if there were no args retrieves the process’s own uid, & (effective) gid to output the same way.

getlimits, after typical initialization, prints various numic constants from LibC.

echo, after mostly-standard initialization if –help or –version aren’t given, possibly parses flags and/or unescape each arg & outputs each in turn.

dirname, after mostly-typical initialization, parses flags & for each arg (complains if none) locates last “/” in the path & outputs up to that point.

date, after mostly-typical initialization, parses & validates flags largely into a format string with a default, computes input time via posixtime or gettime or stat+get_stat_mtime or parse_datetime2, optionally calls settime, & ultimately wraps fprintftime. Or if given -f parses & reformats each of that file’s lines.

chroot, after mostly-typical initialization, parses & validates commandline flags & retrieves ownership from stating reference file or parses the specified user before opening & normalizing all the specified files deciding whether to actual call chown & retrieving fstatat.possibly then does so with appropriate chown variant. Using a shared filesset abstraction to implement recursion, which I’ll study tomorrow.

chgrp works very similarly, using the exact same core function.

chmod, after mostly-typical initialization, parses & validates flags, parses the given mode (usually from first arg) manually parsing & reformatting octal ints or as text or copies this info from stating a reference file (also there’s normalization), uses that same FTS abstraction to implement optional recursion, & for each given file dequeued from FTS it handles (possibly outputting) any dequeued errors, with validation/normalization calls chmod, & possibly outputs info about it.

chroot, after mostly-typical initialization, parses & validates commandline flags, wraps chroot, parses & retrieves groups (as ints or via getgrnam) to also wrap setgroups sycall if available, & execvps remaining args.

And finally basename, after mostly-typical initialization, parses & validates commandline flags before iterating over each commandline arg looking for “/”s & maybe a given suffix to strip off the result it fputs.


Some of GNU CoreUtils’ commands for manipulating the filesystem are more interactive, allowing optional recursion, prompts, & error reporting. Namely cp, mv, & rm. ln & ls appears to be in a similar bucket. These are the topics of today’s study!

cp, after mostly-typical initialization, checks whether SELinux is enabled & initializes a context, parses & validates commandline flags into that context (& a global for e.g. backup suffixes), registers SELinux context, & with a global closed hashmap cp performs some further flags validation, & if the destination is a directory it might start populating that global hashmap & iterates over the remaining args to maybe strip off trailing slashes, maybe concatenate on a parent directory possibly creating it or preprocess away “..”, performs core logic if parent dir exists possibly followed by chown/chmod/SELinux syscalls. Or (after lowering a backup edgecase) directly calls this core logic!

That core logic for copying a file involves possibly considering (for mv not cp) calling renameat2 or equivalent available syscall, possibly carefully fstats to warn about non-recursively copying directories, uses the hashmap to validate args aren’t duplicated, if the move didn’t succeed or wasn’t performed might [l,f]stat it, maybe perform validation against copying a file to itself emitting an error upon failure, maybe checks timestamps, considers whether to abandon the op, warns about overriding a file with a directory or vice versa, performs checks against loosing data, maybe applies autobackup operations, and/or attempts unlinking the destination. Then cp/mv validates it won’t copy into a symlink it created in the process of copying, considers writing verbose output, updates or consults the hashmap, to perform yet more validation, for mv attempts to apply a rename op followed by optional SELinux updates, verbose/error output, and/or hashmap updates.

Then computes permissions whilst configuring SELinux state, if it’s copying a directory it performs more validation, possibly updates hashmap and/or verbose output, & co-recurses over a multistring collected from readdir syscall. If it’s a symlink carefully calls the symlinkat syscall. If it’s hardlink carefully calls the linkat syscall. If it’s a fifo calls mknod falling back to mkfifo. If it’s a device file calls mknod. There’s another link copying case. Then cleans up.

To copy a regular file cp/mv opens & fstats it with a bit more validation, opens the destination file configuring SELinux permissions & cleaning up on failure with a couple more alternate open paths, validates & fstats the destination, possibly attempts to use the clone syscall, or carefully reads from one file to write the data to the other whilst carefully avoiding/punching “holes” then copies permissions over.


mv works basically the same as cp, running the same logic in a different mode & incorporating a trailing call to rm’s core logic.

rm in turn, after mostly-typical initialization, initializes some parameters for it’s core logic then parses & validates flags into it possibly prompting the user (via a possibly localized shared utility function) with the argcount. Before calling the core logic shared with mv. Which incorporates the FTS utility for optional recursion’s sake.

For each file to be deleted involves checking the file’s type. If it’s a directory it might complain (especially if it’s “.”, “..”, or “/”) possibly flagging ancestor dirs before prompting & carefully unlinking it. If it’s a regular file, stat failure, symbolic link with or without target, postorder or unreadable directory, dangling symlink, etc possibly flags ancestor dirs, possibly prompts (gathering data to report), & unlinks the file. It’s skips cyclic directories & reports errors.

ln, after mostly-typical initialization, parses & validates commandline flags before extracting the target & destination filepaths from commandline args possibly creating & fstating it. Then ln initializes autobackup globals, considers initializing a deduplication hashmap, & with some preprocessing runs the core logic per argument. There’s a seperate codepath to this core logic for single arguments.

This core logic involves possibly attempting to call [sym]linkat before diagnosing errors whilst applying backups & tweaking filepaths to try again. If either suceeded it possibly updates the hashmap and/or verbose output. Or reports the failure undoing backup.

ls, after typical initialization, parses & validates it’s extensive commandline flags followed by $LS_COLORS/$COLORTERM in which case it disables tabs & performs postprocessing, if recursion is enabled initializes a hashmap & obstack, retrieves the timezone, initializes “dired” obstacks, possibly initializes a table for escaping URIs whilst retrieving hostname, mallocs a “cwd file”, clears various state, enqueues commandline args whilst lstating them & deciding how to render, dirs are enqueued in a linked list, results are optionally mpsorted then seperates out dirs to be enqueued, the current batch of files are printed to stdout in a selection of formats, then dequeues each directory skipping cycles & repeating similar logic, then cleans up after colours, dired, and/or loop detection if any of those were used.

ls reimplements a tiny subset of ncurses (& lookuptables from e.g. filetypes to colours) for the sake of columnized & colourized output.


To aid implement optional recursion in rm, chown, chmod, etc GNU CoreUtils implements a “FTS” utility.

To open a FTS filesystem traversal it validates the arguments, callocs some memory whilst saving some properties, might test-open “.”, computes the maximum length of it’s arguments to allocate memory to store any of them in, allocs a parent FTS entry & one for each commandline argument referencing it possibly qsorting it, & allocs current entry.

To dequeue the next entry from FTS it performs some validation, considers re-yielding & re-lstating the previous entry for certain kernel errors, considers calling diropen syscall for recursion’s sake. If the current entry’s a directory it may close it if instructed to by caller, possibly clears it’s fts_child property, possibly calls diropen whilst carefully avoiding “..” whilst updating a ringbuffer and/or the system current director, & traverses to the directory’s child.

Then iterates to next child by calling dirfd or opendirat+lstat whilst handling errors, decides whether to descend into directories, initializing some vars, & before cleanup/sorting repeatedly calls readdir, allocing/initializing memory to hold the new entry & it’s filepath carefully handling errors,lstats the file for more fields to store, & inserts into a linked list.

If it found a next entry it gets validated & tweaked as instructed by caller whilst recalling lstat.

Once all the entries in a directory have been traversed it follows the parent pointer freeing previous memory, & validates/tweaks it before yielding that virtual entry.

This tree traversal may be augmented with a hashmap to detect cycles.

I don’t see much use of that ringbuffer…


Beyond exposing language bindings for LibC’s/Linux’s syscalls the other dominant task GNU CoreUtils’ commands perform is to perform textual transformations or summaries of stdin. The simpler cases of which I’ll describe today!

uniq, after mostly-typical initialization, parses & validates commandline flags whilst gathering an array of 2 filepaths. Which if not “-“ will be freopened over stdin & stdout with sequential access & (internal util) linebuffering optimizations enabled.

The fastpath reads each line from input stream (enlarging buffer until finds the configured delimiter, defaults to newline), skips configured number of whitespace-seperated fields & then chars, compares against previous row case-sensitively or not, & depending on grouping mode outputs a delimiter and/or outputs the line whilst updating state.

The slowpath also tracks an error-checked count of repeated lines & whether we’ve seen our first delimiter, moving a line read & write out of the loop.

unexpand converts spaces back to tabs by, after mostly-typical initialization & parsing & normalizing commandline flags into e.g. a tabstop array & temp-allocated filename list both via a shared util with expand, pops & fopens (with sequential optimizations) the first file off that array, mallocs a blank column, & repeatedlys reads the next char popping the next file upon EOF, looks up appropriate tabstop from array upon blank chars stopping future conversions if it goes beyond the end, validates the line wasn’t too long, replaces the whitespace with a tab char if it was already one or we’ve changed tabstops, decrements column upon \b recalculating tabstops, & otherwise increments column. If it has prepared pending whitespace to write it’ll finalize & output it. Then output the non-whitespace char.

Repeating until end-of-line (innerloop) & end-of-files (outerloop).

tac, after mostly-typical initialization & commandline flags parsing including regexp-parsing the “sentinal” & bytesize validation, finds the remaining commandline args default to “-“, configures binary output mode, before flushing remaining output & cleaning up for each arg it opens it in binary mode (handling “-“ specially), lseeks to end, handles seekable & non-seekable files differently, & cleans up.

For nonseekable files it copies over to a seekable file before tacing it.

For seekable files, or after converting non-seekable files, it normalizes the computed seek offset to be multiple of a precomputed read_size lseeking there, before lseeking back then forward a page at a time looking for EOF, & repeatedlies runs the configured regex to find the configured line seperator OR performs a simpler fixed-size string-in-string search, if it didn’t find a match at filestart it outputs a line & exits. Or it reads from line start into newly realloced memory.

If it found a match it outputs that line with or without trailing line seperator, updating past_end & maybe match_start properties.

There’s also an in-memory codepath I don’t see used.

paste, after mostly-typical initialization & parsing commandline flags, defaults args to “-“ escaping those filepaths, runs serial or parallel core logic before cleaning up based on one of those flags.

That serial logic involves opening each file (handling “-“ specially) with sequential optimization.

After opening each file checking for empties, it then copies individual chars from input to output replacing any line delims & adding a trailing one if needed.

The parallel logic involves opening each file validating stdin handling, then repeatedly iterates over each file considering outputting extra delims from a preprepared buffer before repeatedly copying chars from input to output.

Or if the file was already closed it considers which state it needs to update or delims to output.

nl, after mostly-typical initialization & parsing flags, prepares several buffers, processes each file (defaulting to just “-“ a.k.a. stdin) & maybe fcloseing stdin, & returns whether all of those files were successful.

Processing a file involves fopening it (handling “-“ specially) with sequential optimizations, reading each line determining via memcmp whether we’re in a header, body, footer, or (where the real logic/incrementing+output happens) text. Resets counter for non-text.

join, after mostly-typical initialization, registers to free an array on exit, parses & validates commandline flags, gathers an array of filenames whilst determining which join fields to use, & with the two files open handling “-“ specially it runs the core logic.

This core logic consists of enabling sequential read optimizations, initializes state for both of the input files populated with their first line, maybe updates some autocounts, maybe runs an initial join, & repeatedlies…

For each pair of lines join memcmps the appropriate field case-sensitively or not, might output a linked list of fields or just the fields being joined from file at lower key whilst advancing it to next line, advances leftfile until no longer equal then same for rightfile, maybe outputs those lines, & updates each file’s state whilst checking for EOF.

Trailing lines from either file are possibly printed after this loop & memory is cleaned up.

Fields are split upon reading each line.

head, after mostly-typical initialization, parses & validates commandline flags possibly with special handling for integral flags, defaults remaining args to “-“, & in binary output mode iterates over all those args for each temporarily opens the filepath (handling “-“ specially) optionally outputs a filepath header surrounded by fat arrows & uses different core logic for whether we’re operating in terms of lines or bytes & whether we’re outputting a finite number of them.

For fixed number of bytes head copies a bufferful of data at a time until we’ve met that target.

For fixed number of lines head copies a bufferful of data at a time whilst counting newlines, until we’ve line count. Or rather decrements the linecount until 0.

To output all but the last n bytes head, if it can query the filesize, copies a computed number of bytes a bufferful at a time. Or in 1 of 2 ways copies a buffer of computed size at a time, chopping off n bytes once it reaches EOF.

To output all but last n lines on a seekable file head reads it backwards a bufferfile at a time counting newlines until it found the stoppoint in bytes. Then copies a bufferful at a time until it reaches that point.

To output all but last n lines on a pipe head allocs a linked list & repeatedly reads a bufferful at a time maybe immediately outputting the line if we’re not eliding anything, counts newlines in that buffer, & considers merging buffers or outputting the old head.

To wrap text to a fixed width fold, after mostly-typical initialization & flags parsing, iterates over every arg falling back to “-“. For each it temporarily opens the file (handling “-“ specially) with sequential optimizations & reads a char at a time adding them to a buffer.

Upon newlines it writes the buffered text. Otherwise computes new column handling \b, \r, & \t specially. If overflows given width it might locate last buffered whitespace to output until, or outputs full buffer.

fmt, after mostly-typical initialization & parsing flags handling digits specially, iterates over & fopens the args fallingback to stdin as “-“. For each it enable sequential optimizations, followed by a configured prefix, handles preceding blank lines then optionally reads of rest paragraph collapsing solo-newlines. For each such paragraph it performs split-costed linewrapping & outputs them in a seperate pass. Then tidies up errors after the loop.

fmt’s a more sophisticated fold!

To replace tabs with spaces expand, after mostly-typical initialization & parsing commandline flags, finalizes tabstops & saves an array of commandline arguments fallingback to “-“ then dequeues on, & before possibly cleaning up reading stdin repeatedlies: read each char from each file in turn, upon tab looks up the corresponding tabstop & outputs the appropriate number of spaces, decrements column upon \b, or increments column, then outputs the reachar (except tabs translated to spaces).

cut, after mostly-typical initialization & parsing & validating flags including field selection, iterates over all remaining args falling back to “-“, & cleans up after parsed fields & reading stdin. For each it temporarily fopens the file (handling “-“ specially) with sequential optimizations, handling byte & field counts differently.

For byte cuts it counts non-delimiter chars locating appropriate cut entries to determine when to output the delimiter, whilst copying all chars out unless the current cut entry indicates otherwise.

Field cuts works essentially the same, except reading entire fields split by configurable delimiters instead of individual chars.

These cut entries are tracked in an array with high & low bounds.

csplit, after mostly-typical initialization & parsing & validating commandline flags, validates there are remaining commandline args, reopens the given file over stdin, parses given regexps, registers a signals handler, iterates over & applies the given “controls” before carefully temporarily opening an output file to write all buffered lines to. This output file might also be opened when processing any of those controls.

For regexp controls at an offset repeatedlies looks up a line (upon failure to find this it either outputs the rest of the file or reports to stderr) before evaluating the regexp over that line to determine whether to output it.

For non-offset regexp controls it does basically the same logic but slightly simpler.

For linecount controls it creates the output file, reports errors, repeatedly removes queues lines to save to the file until reaching the desired linecount, & tidies up.

Upon dequeueing a line csplit considers whether it needs to read a new bufferful of data & split it into lines.

comm, after mostly-typical initialization & parsing flags, validates args count before running it’s core logic. Which involves fopening each specified file (handling “-“ specially) with sequential optimizations, perfile data allocated, & the firstline read in. Then mergesorts the lines from both files into stdout with or without collation, closes those files, & optionally outputs intersection/exclusion counts.

And last but not least cat, after mostly-typical initialization & flags parsing, fstats stdout to help determine most optimal codepath & maybe sets it to binary mode.

Then cat iterates over it’s args fopening & fstating each one, retring the optimal blocksize, validates it’s not cating a file to itself, if various aren’t set it’ll simply repeatedly copy data from input to output an optimally sized & memaligned buffer at a time OR with plenty of microoptimization it iterates over the buffer reading more as-needed looking for newlines possibly inserting linenumbers and/or escaping lines between them.

Before writing remaining text & cleaning up.


GNU CoreUtils provides several useful commands for rearranging & summarising or rearranging text files!

wc, after mostly-typical initialization, retrieves optimal buffersize, configures line buffering mode, checks $POSIXLY_CORRECT, parses & normalizes flags indicating which counts it should output, possibly opens & fstats the file those specified listing other files to summarize OR consults args, possibly fstats all input files again to estimate the width of the eventual counts, iterates over & validates all files whether listed in a file or args, running the core logic for each or reads from stdin, & tidies-up whilst outputting desired counts.

This core logic involves temp-opening the file again handling “-“ specially, possibly enables sequential optimization, possibly tries fstating again seeks near the end in case the size is approximate & uses repeated read for exact size.

Or the core logic may involve considering whether we can use AVX2-specific microoptimizations before repeatedly reading a bufferful in counting bytes & newlines. AVX2 allows x86 CPUs to do this in 256bit (or rather wc uses 2 of them for 512bit chunks) chunks by summing equality results.

Or it reads a bufferful at a time whilst decoding UTF8 chars (with ASCII fastpath) via mbrtowc handling \n, \r, \f, \t, space, \v, & kanji specially. Or maybe it’s compiled to not support Unicode.

After counting lines, words, chars, and/or bytes in each file it outputs those numbers before adding to the total sums across all files in case we want those values too.

tr, after mostly-typical initialization & flag parsing possibly altering locally-configured locale, validates args count, initializes a linkedlist & escapes then parses a regex-like pattern (or two) via an internal scanner, validates them, switches to binary input mode with sequential optimizations, & under various differing conditions consults a rename table and/or a couple smallintset (both compiled from parsed input pattern) to determine which input chars to output.

tail, after typical initialization & parsing & validating commandline flags obsolete first, defaults remaining commandline args to “-“, locates/validates/warns about “-“ in those args, shortcircuits if certain flags are all unset, allocs an array, considers whether to output headers, enables binary output mode, iterates over all those args performing the core logic, then given -f goes back to output any additional lines writing to those files ideally using INotify syscalls & hashmap to interpret it’s responses.

The core logic involves temporarily opening the specified file in binary mode (handling “-“ specially) & if successful possibly outputs the filename surrounded by fat arrows, locates the last n bytes or lines to write, & given -f validates & populates properties for the above -f loop.

To output all but the first n bytes tail consults fstat beforelseeking & copying bufferfuls from input to output before outputting any remaining text with without headers.

To output the last n bytes tail lseeks to the end of the file less n, decides whether it needs to apply pipe logic or seek back to start, & refines the seek position before outputting any remaining text with or without headers.

For pipes it buffers into a segmented linkedlist until EOF than outputs said buffer.

To output all but the first n lines tail fstats the file, reads bufferfuls counting (or rather decrementing) newlines until it reaches desired count, possibly outputs remainder of that buffer, & outputs the remainder of the file.

To output the last n lines tail fstats the file, tries seeking to end, reading the file backwards a bufferful at a time counting newlines until reaches desired count, outputs the buffer from that point followed by the remainder of the file.

To output the last n lines of a pipe tail reads the pipe into a linkedlist of buffers counting the number of newlines in each until EOF, uses those counts to locate the start of those lines, & outputs them before cleaning up their memory.

split, after mostly typical initialization & parsing & validating commandline flags , extracts & validates commandline args, freopens the specified input file over stdin, enables binary input mode, fstats the input to get optimal blocksize, mallocs a memaligned buffer, performs some trail reads to get the filesize, if in filtering mode registers SIGPIPE to be ignored, & decides whether it wants to apply the logic for digits/lines, bytes, byteslines, chunkbytes, chunklines, or RR.

For linesplits it reads input in bufferfuls counting newlines to determine which output file to write to.

For bytesplits it reads bufferfuls tracking a bytes countdown to determine which output to write to.

To split into lines of a maximum bytesize split reads buffers of the specified size counting newlines within them to determine which output to write that buffer to. Or splits buffer at a line break.

To split bytes chunks it may either be equivalents by bytes or it seeks to a computed start index copying buffers of the specified size to output.

Or there’s a variant which avoids splitting lines.

Or a variant that cycles between the array of output files in a “round-robin” fashion.

sort, after initializing locale as per usual, loads some envvars, further initializes locale, generates some lookuptables, alters signal handlers (mostly to ignore, resets SIGCHLD handler away from parent’s), registers to cleanup tempfiles & close stdout on exit, allocs/clears some configuration, parses extensive commandline flags with various conditional tweaks into those structures & other localvars, opens & tokenizes file specified by –files0-from flag if present, propagates various properties of fields to compare by whilst determining whether any requires randomness, ensures there’s at least one entry in that linkedlist before validating it, maybe outputs debugging information, initializes randomness if required via getrandom via a wrapper reading from a file instead for debugging purposes, sets a default tmpdir of $TMPDIR or /tmp, defaults remaining args to “-“, normalizes the ammount of RAM to use, optionally extensively validates instead, validates it can read all the inputs & write all the outputs, & with minimal tidyup commences bigishdata sort! (Might be overengineered for modern hardware…)

You can configure sort to only use it’s disk-based mergesort assuming given input files are already sorted. This involves an initial loop which merges each pair (or whatever) of input files (by advancing whichever’s head, parsing the line into fields for comparison, is lower copying to output) then each pair of those.

Or when compute’s the bottleneck not RAM (which RAM was on early computers) retrieve CPU core count as max pthreads count & for each bufferful of input from each file in turn temp initializes a mutex-locked priority queue & merge tree node (possibly in a new pthread) to apply an in-RAM mergesort with it’s chunks prioritized via the priority queue. Once these sorted arrays get to large they’re written to disk for the disk-based mergesort.

The comparator either interprets the relevant commandline flags parsed into an array selecting certain fields from the pre-lexed line applying a choice of comparison logics (includes randomly-salted MD5 for shuffling). Or it uses possible collated memcmp.

shuf, after mostly-typical initialization & parsing & validating commandline flags, gathers inputs whether empty, echoed from commandline args, numeric range, or specified files, initializes random number generator, maybe populates an array listing the new random index for each line in the file taking great care to preserve randomness distribution, considers closeing stdin, maybe computes the array via a sparse hashmap & randomly swapping indices, writes the randomly choosen indices or lines at those indices or the “reservoir”, & tidies up.

seq, after mostly-typical initialization & pasing/validating commandline flags, determines whether it can use a fastpath.

The fastpath keeps the number in text form incrementing chars carrying at ‘9’, populating a bufferful before outputting them. Uses memcmp to decide when to end.

Otherwise it parses & validates given floats whilst checking whether it’s actually an int & computing output width, reconsiders fastpath, generates a formatstring in lack of -f, & outputs each multiple of given step (carefully avoiding precision loss) up to limit printfing each adding seperators & terminator.

ptx, after mostly-typical initialization, calls setchrclass(NULL) if that syscall’s available, parses flags, gathers an array of filenames with linecount & text buffer sidetables whether or not args are given or GNU are left enabled, chooses a default output format, compiles given regexp whilst compiling a char rewrite table, loads all chars from a file into that rewrite table as “breaks”, loads a second sorted sidetable of “break words” from a given file, initializes some counts, for each given input file read it all into memory run the core logic & update line counts, sorts results, applies various normalization, & iterates over these results to compute charwidth of it’s fields & output in a choice of syntax including TeX.

That core logic involves iterating over the file’s text probably running the regexp to locate the start index for next iteration, & repeatedlies locates next word start & end via regexp or scanning over chars present in rewrite table skipping empty words, updates max length & counts, binary searches sidetable of allow & block sorted wordlists skipping words as dictated by them, possibly allocs an occurs_table entry & populates it partially as directed by caller, & possibly skips trailing chars than whitespace.

pr (which reformats text for printing), after mostly-typical initialization & pasing & validating/normalizing commandline flags, copies trailing commandline args into a new array, iterates over all those filenames defaulting to stdin or tells the core logic to render them in parallel, & tidies up.

That core logic involves computing various layout parameters opens each file being laidout in parallel (handling “-“ specially) with sequential read optimizations whilst layout pageheader text, possibly allocs memory to store columns, partially laysout a given number of pages to skip them using that parameter as the initial page number, computes some per-column layout parameters whilst choosing per-column callbacks representing whether to directly output text or buffer it in columns, then renders each subsequent page.

For each output page it resets some layout parameters, validates there’s text to layout, repeatedly outputs lines, updates flags, resets state, & outputs padding.

Laying out a line involves iterating over cols calling their callback (possibly skipping the rest of the input’s line) until it has no more lines to output, in parallel mode ensures columns remain aligned even when empty, & considers adding newlines.

Whilst laying out columns per-page it reads in the first line for each of them & reshuffles lines between columns to keep them balanced.

That callback applies text alignment, line numbers, textwrapping, etc & buffers text via the other callback.

od, after mostly-typical initialization & initializing a couple lookup tables, parses, validates, & normalizes commandline flags, choose a “modern” or “traditional” for extracting commandline arguments, possibly into a printf-string & read bytesize, defaults commandline args to “-“, opens the first file of those, carefully skips a specified number of bytes, computes the least common multiple (shared util) between all given readwidths & uses that to compute number of bytes per block, computes necessary padding to align output, in some builds outputs debugging info, & runs one of two variants of the core logic before possibly attempting to close stdin.

If we’re “dumping strings” from the file repeatedly it keeps reading bytes looking for at least a given number of ASCII chars loading them into a buffer, then reads until it’s found the nil terminator resizing the buffer as necessary, then outputs the address via configured callback & escapes/outputs the found string.

Otherwise od reads bufferfuls at a time with or without (two mainloops) checking against end offset, outputting last byte specially consulting computed lowest-common-multiple, & outputs end address.

Upon reading bufferful of data it considers closing current on EOF & opening the next one. To write that block it first compares against previous block. If they were equal it’ll output atmost “*\n”. Or output address followed by each specifier’s callback possibly followed by hex format.

digest, after typical initialization & enabling output buffering, parses & verifies flags, defaults args to “-“, & iterates over each arg either checking the hash or computing the hash & outputting it via the caller-specified callback.

To check a file’s hash digest opens the file (handling “-“ specially) repeatedly reads a line & strips it, extracts the hash, whether it’s a binary file, & filepath, runs core logic, compares to expected value, & decides which results to output.

The shared library for computing CRCs (not a really a hashfunction, but works well for detecting transmission errors!) embeds via the C preprocessor a script to generate it’s own lookuptable headerfile using bittwiddling & 2 intermediary tables. Has seperate codepath specifically for making optimal use of x86/x64 CPUs to take advantage of their pclmul instructions.

CRCs at it’s core involves repeated bitwise shift & XOR.

Finally base## (e.g. base64) commands, after mostly-typical initialization & parsing flags possibly (depending on build flags) including a seperately option for desired base, validates it has at most one argument defaulting to “-“ which it then temporarily fopens (handling “-“ specially) with sequential read optimizations, & either decodes or encodes the data as specified by -d.

For decoding it mallocs some buffers, reads a fullbuffer in, & calls the appropriate decode function.

Encoding works basically the same way but possibly with added text wrapper.

The logic (which is in a library shared within GNU CoreUtils) for encoding & decoding base32 or base64 text involves bitshits, bitmasks, & character lookup tables. There’s wrappers around this, as well as similar code written inline with the command, tweaking the behaviour for additional basenc options.


Not every system call GNU CoreUtils exposes to the commandline is very text centric.

dd, after typical initialization & configuring signal handlers, retrieves the system’s optimal buffersize, initialization a translation table with sequentially-incrementing numbers (identity transform), decodes keyword args in a different syntax from usual into e.g. filenames & bitflags & ints, updates that transform table, opens the if= given input file with specified flags checking whether it’s seekable, opens the of= given output file with specified flags possibly iftrancateing it use ifstat to diagnose failures, retrieves the current time via whatever highprecision syscall is available, & runs the core logic before diagnosing errors, ensuring any signals have been handled, cleaning up, & outputting final status.

dd’s core logic involves maybe fstating & lseeking the given offset fallingback to read twice (the second time outputting zeroes in their place for “seek” options), allocs input & output buffers, possibly retrieves current time to determine whether to outputs status, stops if we’ve copied enough records, maybe zeroes the input buffer, reads a full or possibly partial (depending on if keyword arg) buffer of conditional size whilst handling signals & warning about errors, updates counters possibly clearing output cache or possibly lseeks past bad blocks whilst invalidating cache maybe ending this loop, possibly zeroes the input buffer’s tail, maybe takes a fastpath outputting that input buffer immediately, maybe renames all the bytes according to the translation table, maybe swaps every two bytes, & lseeks & writes that postprocessed buffere either in full or a char at a time whilst tracking columns.

After dd’s mainloop it outputs the final byte if present, maybe pads with spaces, maybe adds a final newline, outputs last block if necessary, if the final op was a seek fstats, lseeks, & ftruncates the file, & f[data]syncs the file whilst handling signals.

Signals are handled several of these syscalls. Clearing output caches involves some throttling & posix_fadviseing.

Some standard translation tables are bundled for e.g. EBCDIC.

df, after mostly-typical initialization & parsing & validating flags including resolving filesize units filling in certain missing ones from envvars, test-opening each file whilst stating it, parses the syscall or device file listing currently mounted filesystems complaining upon error, maybe ensures all data is syncd so we can analyze it, allocs fields to hold each row data & outputs a header, gathers desired entries, probably outputs them ideally nicely aligned, & cleans up.

Gathering desired entries may involve for each commandline arg iterating over the mount linkedlist looking for the specified device whilst canonicalizing filepaths & stating the files looking for closest match, then reformatting format into text adding filesize units back in whilst calling which ever stat[v]fs-variant syscall is available. Or complaining if that device has been “eclipsed”.

Then another couple iterations where the device is used.

Or it might iterate over the mountlist deduplicating it via a temporary hashmap & filtering the entries as specified by the parsed commandline args whilst stating them, populating a templinkedlist before possibly copying it over, then iterates over the now-filtered mountlist to populate the table as per before.

In populating the table it increases some counts to also possibly output.

The alignment code is quite sophisticated, & in a internally-shared library.

du, after mostly-typical initialization & parsing & validating commandline flags & $DU_BLOCK_SIZE & maybe $TIME_STYLE envvars, determines where to read the argument list from possibly freopening the file given by –files0-from over stdin, mallocing a hashset of device-inode pairs, tweaks some bitflags, repeatedlies retrieves & validates the next specified file to apply the core logic to, tidies up all the allocated memory, & possibly prints the total count.

du’s core logic involves reusing the filetree traversal used by chown, chgrp, rm, etc.

Upon anything other than errors (which it reports) or entering directories it checks whether the commandline flags specified to exclude the file. If not it configures NSOK entries to be revisited & precedes to the next one validating it’s not an error & reconsidering whether to exclude. If so it tells the traversal to skip this entry.

For dirs it does no further processing & errors are reported.

Then per-file du gathers a local struct from the provided stat info, callocs or reallocs to form a tree out of this data, adds to appropriate counters, & maybe outputs the filesize, maybe date, & label.

There’s cycle detection logic in traversing directories referring to a lazily-loaded mounttable & the device+inode hashset.


env, after mostly-typical initialization & initializing a signals table, parses & validates commandline flags & validates it’s not receive the = op, resets all signal handlers to either default or ignore, maybe configures a signal mask, & maybe outputs these new signal handlers, maybe switches to a new current directory possibly with debug/error messages, maybe outputs which command it’s about to run, & execvps it tidying up on failure.

There’s a shared util func for traversing up a filepath to until before it changes st_dev/st_ino.

install, after mostly-typical initialization& parsing & validating commandline flags largely into a struct but also globals like (from a shared util mentioned previously) backup suffixes, validates there’s at least one additional commandline arg, further parses a couple flags, & either with a preserved current working directory & SELinux context creates the specified directory with any missing parents.

Or with a global hashmap (and with or without creating any missing parent dirs) stats the file, if successful copies the file over into new location as per cp if needed, if successful maybe runs the strip command over it, copies timestamps over via utimens syscall, & copies permission (both traditional UNIX & SELinux) permission attributes over.

Or it prepends a directorypath first before doing that.

pinky starts by mostly-typical initialization & parsing commandline flags.

In short mode pinky reads the specified UTmp file, determines which datetime format to use, outputs a heading line, & iterates over that UTmp file looking for user processes possibly filtered by a provide arrayset, stats the listed file & consults LibC’s flatfile databases to determine what text to output with.

In longmode it iterates over the commandline args, consults pwnam for more extensive info to output, followed by the user’s ~/.project & ~/.plan files.

There’s shared utils for manipulating SELinux devicefiles. There’s other shared utils for parsing field references from commandline flags. And another involved in quite sophisticated commandline flags parsing.

In storing data in magnetic fields harddisks left residual traces of deleted data, so it can be useful to repeatedly overwrite it with whitenoise to ensure that data is truly gone. Solidstate drives I believe don’t have the same issue, and using these “secure erase” tools just serves to shorten their lifespans. GNU CoreUtils provides shred for this.

shred, after mostly-typical initialization & parsing commandline flags, validates there’s additional commandline args, initializes a random-number generator & registers for it to be cleaned up on exit, & iterates it’s commandline arguments (handling “-“ specially, jumping near-straight to the core logic) temp-opening & maybe chmoding if necessary to apply the core logic before repeatedly renaming the file to progressively shorter names & unlinking it whilst syncing each step.

shred’s core logic fstats the file validating result, computes optimal buffersize whilst retrieving exact filesize, populates the buffer with random data with various counters possibly reusing previous chunks of randomness, & go over buffer again to improve randomness slightly, & repeatedlies seeks back to the start of each block a given number of times, possibly bittwiddles the buffer, outputs status, & repeatedly verified-writes random segments of the buffer to the file being shredded.

After the innermost loop shred outputs status info & syncs to disk so it actually has an effect.

stat, after mostly-typical initialization & parsing flags, validates there’s remaining args & (filling in dynamically-generated defaults) it’s -c/–printf flag, & for each commandline arg calls fstat or statfs or available variant syscall, before manually interpreting (lots of options) the given format string to generate output text whilst maybe locating mountpoint or SELinux context.

stdbuf, after mostly-typical initialization & parsing commandline flags, validates there’s additional commandline args, sets specified envvars, extracts the directory containing this command possibly referring to /proc/self/exe symlink, configures LD_PRELOAD envvars to “libstdbuf.so” whereever that is, & execvps the remaining args. libstdbuf in turn adds a little code to the executable(s) which parses those envvars to pass to setvbuf syscall.

stty, after mostly-typical initialization & parsing & thoroughly validating commandline flags, possibly reopens the specified file over stdin turning off blocking mode, retrieves the input’s mode, possibly parses $COLUMNS whilst outputting the specified subset of hardcoded controlchars. Or iterates over all specified settings (amongst other IOCTLs) encoding a new mode to subsequently pass to tcsetattr & reports if tcgetattr yields anything different.

Includes lightweight text wrapping.

test/[, after mostly-typical initialization, specifically checks for sole –help & –version flags on [, validates there are args, & runs an immediately-evaluated pushdown parser with a scanner but no lexer beyond the caller splitting commandline args. Whose leaves call various syscalls, typically stat variants to retrieve/compare different returned properties.

timeout, after mostly-typical initialization & parsing flags, validates there’s at least 2 args remaining, parses the next commandline arg as a timeout duration with an optional unit, maybe calls setpgid(0, 0) so all subprocesses are killed with timeout, configures signal handlers, & forks execvping the remaining commandline args in the child process with reset SIGTTIN & SIGTTOU signal handlers, in the parent process ensures we receive SIGALRM signals calls the appropriate available syscall to schedule it’s triggering, blocks several signals, & waits for child process to end.

Once the child process has ended (or a signal was received) it checks a flag set by the SIGALRM callback & kills self without coredumps.

Various signals including SIGALRM considers killing the child process or resetting a second timeout.

users, after typical initialization, parses the given UTmp (or default) file, iterates over every userprocess therein extracting names to qsort then output & deallocate.

And finally for today who, after mostly-typical initialization & parsing & validating commandline flags into various bools, selects a time format with max charsize to allocate to interpreting it, decides how to behave based on commandline args count, temporarily-parses the given or typically UTmp file, & decides how to handle it by presence of -q.

If it’s present it iterates over all user processes, extracts & outputs the trimmed name, & outputs how many of those entries it counted.

Otherwise considers outputting a heading line, considers calling ttyname for data to filter entries to only list ourself, & for each entry in that file to output an (if enabled for it’s type) appropriate line for it’s type. This list bit gets fairly involved yet tedious, shares an internal utility for outputting tablelines & formatting times/periods.


tsort, after typical initialization & validating there’s at most one arg defaulting to “-“, mallocs a root node, freopens the specified file over stdin if not “-“ enabling sequential read optimizations & initializing a multistring tokenizer, for each token it locates where to place it in the tree whilst balancing & inserts a new treenode there, validates there was an even number of tokens, counts all treenodes, & computes output from it.

To compute output from it’s binary tree tsort gathers a linkedlist of binary tree nodes with no dependencies, outputs each of their strings whilst removing them from the binary tree & decrements the counts on their dependencies to determine which to add to this linked list. If there’s tree nodes left after this that indicates there’s a loop in which case it iterates over the tree to find & output these loops, removing an edge to break the cycle so it can try again.

pathchk, after mostly-typical initialization & parsing commandline flags to global bools to determine checks are performed, validates there’s at least one additional commandline arg, & iterates over them. For each it maybe checks if there’s a leading hyphen (early UNIX bug treated all of those as stdin), maybe checks if the filename is empty, maybe checks whether the filename is pure ASCII (excluding symbols & whitespace) or checks whether the file exists via lstat, partially based on that it might check the charlength of the filepath, it might check the charlength of each path component in 2 (fast & slow) passes. Appropriate error messages for any of these failing checks are written to stderr.

Though I sure hope no modern system requires these portability checks!

numfmt, after mostly-typical initialization whilst maybe setting the CPU’s floating point precision & retrieving default decimal point from locale, parses & validates flags including a printf-like format string, reallocs a buffer according to configured padding, & iterates over commandline args (with possible warning in presence of –header) OR stdin’s lines (the first given number of lines of which of which are treated as a header).

For each (surrounded by delimiters) iterates over the line’s fields removes specified suffixes & whitespace, maybe reconfigures the padding buffer based input charlength, carefully parses the number more leniently & resiliently, computes charsize to validate it’s not too large, reassembles the printf format string whilst applying a choice of rounding to parsed number to reconsider whether to show the decimal point, applies that format string, possibly adds a suffix & applies alignment via mbsalign, & outputs that formatted number with appropriate prefix & suffix. Or outputs raw input text.

expr, after typical initialization & validating there’s non-“–” commandline args, and runs an immediately-evaluated pushdown parser with a scanner over the commandline args operating upon a tagged enum holding either multiprecision integers (mpz_t) or multibyte strings. Also can be tested if it’s falsy or evaluate regexps upon the : string infix operator.

Results are converted into a textual output and a boolean errorcode.

dircolors, after mostly-typical initialization & parsing & validating flags, either outputs some pre-compiled hardcoded text, OR guesses which shell is used based on $SHELL if not explicitly stated via commandline flags before reformatting the input file or stdin surrounded by shell-specific prefix/suffix text.

Parsing/reformatting the specified input streams involves possibly temporarily-freopening the specified file over stdin if not “-“, retrieves $TERM, & repeatedlies with linecounts reads & straightforwardly-parses each line, if the keyword was “TERM” check if it matches $TERM, & unless it didn’t reformats keyword-arg pairs quoting each seperated by “=” possibly adding or removing punctuation, replacing keys with acronyms in a pair of lookuptables, or dropping “OPTIONS”, “COLOR”, & “EIGHTBIT” keywords.

This reformatted text is buffered into a string for output.


To relatively efficiently extract prime factors from a number factor, after mostly-typical initialization with added exit-handler outputting any remaining buffered text, parses a handful of commandline flags, possibly zeroes out a frequencies buffer to output after core logic, iterates over trailing commandline args or tokenized stdin.

For each it parses the number, considers taking a fastpath or reporting any errors, or fallsback to using multiprecision arithmatic.

The fastpath (if the number’s small enough i.e. 2 words) recursively dividing by 1,000,000,000 & taking remainder to aid outputting the int to which it adds a “:” (a similar technique is used to output to output factors once computed), & computes the actual factors by first trying extracting some obvious factors & iterates over a pre-generated (I’ll describe how soon) table of prime numbers which are factors of the input in two passes to quickly discard options.

If there’s more prime factors to find it’ll check if the simplified input itself is prime using some math theorems (Miller-Rabin & Lucas) I’m not familiar with, after discarding any additional 2 factors. If so it adds it to a large prime factors array to be outputted seperately.

Otherwise it computes the square root checking if that’s a prime then iterates over 1 of 2 tables & does more computation involving squareroots, ofcourse remainders, & recursion.

Within or after that pass it tries using Pollard Rho’s recursive algorithm involving modulo-multiply/adds/subtracts to narrow done prime candidatse to record.

There’s variants of most of these function for operating in either or two words, and a variant of all of them to operate in a dynamic number of words.

To autogenerate the smallprimes table it parses the first arg as a int, allocs/zeroes some tables, iterates over that range, & outputs results.

For each number “i” in that range it populates a table with p=3+2i, than populates each multiple of p from (pp - 3)/2 to the given max number.

To output the actual primes from those tables it counts the number of bits in a wide_uint & outputs it as a C-preprocessor macro, outputs P macro calls for each prime as the diff from last prime, diff from 8 ahead, & (via bitwise shifts & ORs) the inverse. Then uses the inverse & limits to locate the next prime to output as FIRST_OMITTED_PRIME.

Ext4FS

Having just studied GNU CoreUtils, of which most are more or less simple wrappers around various syscalls (for trivial wrappers, see historical code), which LibC will forward to Linux via a special Assembly opcode & Linux will forward to it’s appropriate implementation via caching/mounting lookup layer called the “Virtual File System”.

Linux supports various filesystems but the one you’re probably using is called Ext4FS which I’ll study today!


To aid allocation of blocks in which store files Ext4FS computes group numbers & offsets from a block number ideally via divide, & provides per-group checksummed bitmasks to test whether that memory’s free.

There’s functions for retrieving clusters counts, maybe summing/subtracting/counting them.

Another for dereferencing a block’s descriptor.

There’s a function to carefully retrieve & validate (partially via checksum) a group’s allocation bitmask, with or without blocking on locks. Initializing that bitmask if needed, populated with a scan.

One to check a desired allocation count against various (mostly per-CPU) counters in the superblock’s info to see if desired memory is available, possibly claiming via a wrapper function.

Another for retrieving the # of blocks used by a group, or to retrieve the supergroup possibly via GCD.

There’s a function for counting free blocks by summing each group’s group (with valid bitmask) descriptor’s count, which is validated against bitmasks in debugging build.

It may compute a hint for this allocation by first considering bitmasking & maybe incrementing which blockgroup the rootnode specified it should use, applying a rootnode-specified multiplicand & offset to get the first block number of that group, reads it’s blockcount property & computing from the thread ID.

There’s a multiblock buddy allocator implemented around the single-block allocator & a redblack tree.

There’s code for migrating a file’s blocks between allocation groups.


There’s code to serialize & deserialize access control lists between a file attribute a Linux-shared type.

To handle readdir syscall Ext4FS retrieves encryption data if present, if it’s htree-indexed initializes an iterator if necessary with hashes & that we haven’t reached EOF before reformatting into a redblack tree & possibly unsets a bitflag if there’s a checksum, checks if there’s inline data as a file attribute (useful for configuration & lock files) which it reads specially, maybe allocates some memory to decrypt into, & repeatedlies checks for fatal signals, maps in some more blocks handling error & empty cases, maybe validates blocksizes and/or checksums setting a flag on success, & in a innerloop validates directory entries, increments an offset, & emits each directory entry with or without decryption.

llseeking Ext4 dirs isn’t speical.

HTree directories are converted into redblack trees & on into linear scan dirs if they client wishes to list it. Data from this conversion may need to be freed upon closeing the directory.

There’s a validation function alongside this readdir implementation.

Lots of encoding details are defined structs, enums, & macros. With inlined functions handling byteorder.

There’s a “journal” which tracks all ops in a ringbuffer to aid recovering from unexpected shutdowns.

Extent references have checksums, credits (akin to currency, to determine when to merge), access permissions, dirty flags, meta & space blocks & roots with indexes, & (pre)caching.

They can be split, validated, read, seek via a binary search (2 variants), initialized & deinitialized, “mapped”, & zeroed.

To “map” an extent (ignoring debug output) it first traverses the extents tree reading & validating entries in as needed thus flattening it for a binary search, gets & validates depth, retrieves treetraversal path at that depth if so considers expanding certain holes before returning that block, otherwise unless this is disabled creates it.

Which involves gathering some fields, allocating some space with good memorylocality ideally by extending the allocations on either side, inserts into extents tree merging where profitable, updates reservedspace counts, & considers syncing to the journal.

To truncate some extents it calculates the range to remove, deletes it under lock from tree via a redblack tree retrying slightly later upon failure then any trailing holes.

To fallocate some extents it depending on the given mode it’ll return -EOPNOTSUP, flush state & removes those extents with journalling, tries inlining data into file attributes, flattens that range of the tree, removes the given range from the tree with journalling, zeroes it out, or allocs new ranges.

There’s a couple wrappers around mapping extents converting a selected range to an IOVec to be copied to userspace.

To map some “fiemap” extents it checks cache if indicated to by bitflag (or clears bitflag), validates the given range, & defers to Linux’s generic filesystem code with a callback to read from the file attribute or have it wrap the map blocks code.

To precache it might check inlined data under lock, retrieve cache if bitflagged, run generic logic, validate range, & flattens tree.

Related to the extents tree there’s an extents status tree used for partially locking files and more. This is implemented similarly to extents trees but is entirely in-memory as a redblack tree.

There’s a journalling fastpath for smaller ops.

Underlying the read syscall it constructs an iterator in one of three types if not shuttingdown & non-empty.

Upon refcount deallocation allocs dynamically-allocated blocks, discards preallocations under lock, & frees HTree dir info.

Underlying the write syscall it constructs an iterator in one of three types if not shutting down based on bitflags. The “dax” write iter, after validation, starts journalling, registers & journals an “orphan” inode, & hands a callback to filesystem-generic code possibly wrapped in a decorator or two. I’ll describe these callbacks later. One of the decorators handles most of the journalling, another is filesystem-generic.

The “DIO” iter checks memalignment, obtains locks, journals, hands one of two callbacks (whether we’re overwriting or not) to filesystem-generic code possibly decorated with journalling, cleans up, & commits writes via filesystem-generic code.

Most methods on mmap’d Ext4FS DAX files segfaults unless it’s copy-on-write, though I’m not making much sense of this callback. The methods for normal Ext4FS files are largely filesystem-generic though it ensures blocks are mapped before being written to.

The implementation for mmap decides which of those methodtables to hand to the given virtual-memory struct to integrate into the process’s memory mapping, if it’s not shutting down or unless needed DAX mapping is unsupported.

The implementation for open, with several wrappers, defers to filesystem-generic code. Ext4FS’s llseek is also largely filesystem-specific.

There’s an internal filesystemmap object, which I think lives ondisk.

Syncing is reflected in the journal as “barriers”, and flushes the underlying blockdevice.

There’s an internal hashfunction.

There’s functions for allocating & deallocating inodes directly out of the relevant allocation bitmasks.

There’s a handful of functions dealing with some concept of “chains”, blocks, & paths.

Functions for operating upon file bodies stored “inline” within the file’s attributes.

This section’s quite long, so won’t cover the highlevel inode objects exposed externally.

There’s several supported IOCTLs, updating properties & deferring to the other lowerlevel components.

It uses “MMP” checksumming for the allocation bitmasks, filebodies, filemetadata, etc.

Extent slices can be shifted around.

There’s a concept of orphaned inodes, which sometimes is just a step of allocating inodes.

The I/O methods are an abstraction around internal paged I/O functions.

There’s several functions dedicated to resizing allocation groups.

There’s an internal rate-limited “superblock” structure, defined alsongside some of the methods for mounting & unmounting ExtFS filesystems.

Symlinks are their own type of inode, or rather 3, as exposed to outside world.

There’s a pagecache with (publicly exposed) “verity descriptors”.

And natively understands several file attributes including HURD’s.

Util-Linux

In order to interact with the Linux kernel, you need userspace commands (or apps) communicating with it. Since vital I/O components like Bash or Gettext operates in userspace, Linux cannot provide its own UI.

Disk Utilities

Perhaps the most useful commands Util-Linux provides are for handling disks (persistant storage) containing the filesystems on your computer, & splitting these disks into multiple “partitions” via a table the earliest boot stages understands.


After initializing internationalization, parsed commandline flags, colourful output (with config), & several other shared libraries sfdisk configures libfdisk, gathers an array of fields, chooses which subcommand to run & cleans up.


swaplabel, after initializing internationalization & parsing commandline flags validating further commandline args are given, uses a shared library to locate the devicefile, opens that devicefile to either write new metadata at appropriate offsets or reads that metadata via the shared library.

After initializing internationalization & parsing version/help commandline flags resizepart opens the specified devicefile, locates the devicefile for the specified partition, & applies the BLKPG IOCTL.

After initializing internationalization & parsing few commandline flags raw opens a special devicefile to gain privileges, runs the RAW_GETBIND IOCTL on each “minor”, validates commandline arg, possibly stats that devicefile to inform a subsequent RAW_GETBIND IOCTL, extracts major & minor numbers from commandline args, & runs RAW_SETBIND IOCTL.

After same initialization isosize iterates over commandline args. Foreach it opens & validates the devicefile & outputs them textually.

There’s a support file to render menus to aid creating scripts, amongst a few other simpler supportfiles.

After typical initialization fdformat validates the next commandline arg & opens the specified devicefile calling the FDGETPRM IOCTL on it to retrieve various statistics to output, applies formatting IOCTLs (FDFMTBEG, repeated & flushed FDFMTTRK, & FDFMTEND) with status messages, possibly attempts reading the devicefile to see if there’s anything to repair, & cleans up.

After typical initialization delpart opens the specified device file & runs the BLKPG IOCTL on it.

After initializing internationalization blockdev checks for -V/--version or -h/--help args, checks --report in which case it instead iterates over all remaining args or all partitions to retrieve various attributes from Linux & textually output them, validates commandline args, opens each specified device file & interpret commandline flags as IOCTLs upon it.

addpart wraps a different variation of the BLKPG IOCTL.

IOCTLs are basically special methods upon devicefiles.


After initializing internationalization & parsing/normalizing commandline flags partx validates the commandline args possibly stating the specified file, possibly parses an integer out of the path, possibly outputs parsed commandline args, upon add or delete subcommands stats & validates the wholedisk devfile using helper functions to correct this, opens the wholedesk devfile, & chooses a subcommand before cleaning up.

partx’s delete subcommand possibly scans the filesystem for unspecified partitions numbers before deferring to a partx_del_partition function from a shared library wrapped in optional human-readable reporting.

Other commands require a “probe” implemented by another shared library.

After initializing internationalization & parsing few commandline flags validating a single arg remains fsck performs file validation. This file validation includes simple structural checks, error-detection checks via CRC32, & binary-syntactic checks with decompression recreating what files/directories/etc it can. At least that’s for fsck.cramfs.

fsck.minix (after initializing internationalization, configuring exitcode for a utility lib, validating theoretical sizes, & parsing a couple more commandline flags validating commandline args remain) validates the devicefile isn’t mounted, opens it, validates metadata, After checking quick exiting conditions fsck.minix continues by parsing/validating the filesystem possibly outputting status info, configures signalhandlers, alters shell input handling flags, parses the filesystem recreating it with occasional user-prompts for how to recover from errors, possibly outputs summary info, if anything changed flushes & sync’s final pieces (superblock always written in repair mode), & cleans up.

After configuring buffering, initializing internationalization, configuring exit-handling, parsing extensive commandline flags & envvars with preprocessing step, configuring SIGCHLD handler, & loading the mounttable fsck performs some validation & iterates given devices before synchronizing & cleaning up.

For each given device it considers exitting as indicated by signal-handlers, looks up the mount or adds new one, considers skipping it based on type or whether its mounted, & runs appropriate command.

After initializing internationalization & debugging options as well as parsing commandline flags into a new FDisk Context & configuring the shell fdisk branches upon a subcommand before cleaning up.

After initializing columns the List or List-Details subcommands iterates over commandline args or all partitions foreach parsing various info to output them textually. Foreach commandline arg the showsize subcommand opens the devicefile & calls the IOCTL to count its sectors for output.

After validating a single arg remains the main subcommand colourfully outputs some welcome text, “assigns” the devicefile, if successful warn if its being used, flushes stdout, validates its writable & locks the devfile, determines the “wipe mode” if in collision, adds label if missing or warns about GPT labels, reads partition info if it isn’t readonly, & presents an interactive menu for constructing a sfdisk script via a pre-configured callback. Most of the code is dedicated to this menu.

After initializing internationalization, colouring, debug options, & a new FDisk Context as well as parsing few commandline flags cfdisk fills in missing commandline arg, “assigns” the devicefile, locks the devfile if it sin’t readonly, initializes/runs/ends the columnar NCurses menu UI (navigable with arrow-keys) with pre-configured callback, & cleans up. This includes a main WYSIWYG.

mkfs

Before you can use a disk, you need to write an empty filesystem to it. Linux-Utils provides a handful of commands to do just this for different filesystems.

Lets start by discussing swapspace, where data is written when RAM overflows. After initializing internationalization & parsing commandline flags capturing up to 2 additional commandline args mkswap possibly generates an appropriate UUID, computes an appropriate pagesize from Linux’s figures, performs some validation whilst getting the disk (as opposed to RAM) pagesize, stats/opens/locks/validates the device file, validates the devicefile is fully usable & has no holes, zeroes out the carefully devicefile in reference to partitions, populates some structural metadata, possibly reports the partition size as human-readable text, writes some magic numbers, writes UUID & label, cleans up, & performs some SELinux adjustments.

After initializing internationalization & parsing commandline flags parsing subsequent couple args mkfs.minix performs some validation opening & locking the devicefile, gathers & computes various structural metadata including optimal pagesize & a bitmask reporting counts textually to stdout, optionally validates it can seek to & read each zone updating the bitmask or reads a file of bad blocks, serializes this structure to the disk (root, then bad inodes), marks good inodes & serializes out filesystem tables, & cleans up.

After initializing internationalization & parsing commandline flags followed by 2 additional args mkfs.cramfs retrieves ideal pagesize, stats & opens the devicefile, initializes permissions, populates & (via MD5 hash) deduplicates from a template dir, allocates & zeroes a large chunk of memory, optionally loads a file into it, write file metadata in breadth-first order, postprocesses the CramFS entries compressing any filebodies, computes a CRC32 to protect against data degradation, checks we have allocated enough space, writes this buffer to specified disk closing the file, & reports any flagged warnings.

I believe CramFS is what distros use when they’re running off a USB or DVD, before they’ve installed onto rewritable disk.

After initializing internationalization & pasing commandline flags extracting 1 or 2 additional args opening & stating the first as a devicefile mkfs.bfs retrieves optimal pagesize for the disk or user-specified value, computes some sizing info gathering that into structural metadata, optionally reports various data, writes that structural metadata to specified disk, gathers & writes more structural metadata, zeroes out the body, & seeks to root inode to write “.” & “..” entries.

And finally all these variants are wrapped by a mkfs command!

After initializing internationalization & parsing few commandline flags mkfs defers immediately to its specified variant defaulting to (which doesn’t appear to be in this project) mkfs.ext2. I hardly see the point of this command… Ah, its deprecated!

libblkid

Underlying the suite of disk commands described above are a suite of shared libraries, including libblkid for accessing a disk’s metadata!

Some of these APIs simply wraps IOCTLs like BLKGETSIZE[64], most are more complex. The library can report & parse its own version number.

There’s a routine for parsing a given key=value configuration file defaulting to $BLKID_CONF. There’s a blkid_dev object containing a linkedlist of tags with a human readable serialization.


There’s an iterator which over a cache for retrieving these blkid_devs with configurable filters.

Much of the logic is involved in locating, reading, & throttled-writing an aggregated XML-like (doesn’t use an XML-parser library though) cachefile of all disk metadata, storing entries in a garbage-collected linkedlist.

Wrapper functions retrieve tag values & device names from cache, & to verify device in the cache.

There’s a parser for which metadata to retrieve.


They’ve implemented their own UTF-8 encoder library for some reason, & wraps it with escaping logic. I’ll discuss a use for this tonight.

There’s a suite of utilities for querying a disk’s tags, including an iterator.

There’s utilities for traversing the /dev filesystem for disk devices & their partitions possibly populating a linkedlist.


There’s a routine for iterating over the cache multiple times (strict vs fuzzy match) looking for disk by name adding & verifying a new entry if needed. A wrapper looks up by disk number with filepath normalization, checking multiple paths. This in turn has its own wrappers consulting the /proc/partitions/, /proc/lvm/VGs/, and/or /sys/block virtual filesystems.

There’s parsers for strings Linux might send programs to notify them of new hardware.


There’s a probe object which retrieves metadata directly from a disk devicefile via its fstat info with few additional IOCTLs, & aids reading it via a values-linkedlist & 3 method tables it can step between (treating it as 2 semi-fallbacks), temporarily switch to a different methodtable, or unset. And it has routines for linkedlist-buffered reading from a configurable-slice of the devicefile.

These methodtables, each implemented in seperate subdirectories, provide facades upon further methodtables parsing a wide variety of disk formats including partition tables, filesystem superblocks, & to a lesser extent “topology” tables. Each of those sublibraries each further provides common utilities to aid the parsers in yielding common datastructures.

libfdisk

Another supporting library behind Util Linux’s disk commands is libfdisk!

There’s a “labelitem” object with name, id, 64bit, & type-tagged data fields.

There’s an iterator object holding a direction.

There’s code (generated largely by the C PreProcessor) for parsing a bitmask of what info to return.

There’s a “label” tree-object with name, type, & flags as well as a methodtable. The context object stores a forest of these, & an active one.


There’s a routine which calls the probe & possibly deinit methods on labels to decide which one to make current. I see several datamanagement methods on these label objects, as well as some structural metadata I field to mention.

There’s a partition tree-object with a type, name, uuid, fstype, fsuuid, fslabel, & slices. There’s a routine for extracting the next step of the linear partition order, possibly asking the user, or to serialize to a human-readable string.


There’s partition tree traversal/manipulation methods, and ones to compute possible slices of the disk with human readable debug output whilst enforcing invariants. May defer to the label’s methods.

There’s a field object containing an id, name, width, & flags.

There’s a partition-type object associated with labels each holding a name, typestring, & code. These can be queried from a label, possibly based in textual user input.


An FDisk script holds an FDisk table, linkedlist key-value headers, an FDisk context, refcount, getter method, linecount & Fdisk label & a couple flags.

Amongst normal object routines these “scripts” has routines for parsing from a context or line-by-line with a tokenizer from a file, or as JSON. Or to save such files.

There’s routines for requesting user input via callbacks, and to align bytes as required by hardware.


An FDisk context refcounted object wraps a filedescriptor, labels, callbacks, structural metadata, script, etc. The routines for associating & deassociating with an open devicefile are nontrivial, populating the partitiontable via an IOCTL.

An FDisk table holds a partitions list with mutation (especially regarding freespace), sorting, serialization, validation, & diffing routines.

There’s some I/O utilities. & partitions may have a list of areas to be wiped, possibly applied via libblkid.

Partitioning table formats

Util Linux’s libfdisk’s Context object contains an array of labels, each corresponding to a different partitioning format via a methodtable & propertytables. This is where most of libfdisk’s code is! The commandline menus may directly call accessors specific to each format.

I’ll specifically describe how its GPT support works, the others are implemented in basically the same way.


To “probe” the GPT label it casts the read data to the appropriate C struct, iterates over some arrays to validate the structure, normalizes the format to what the CPU can process. Whilst dereferencing pointers from disk with CRC32 error detection. There’s a couple copies of these headers saved far apart in case one gets corrupted. This load can fail, requiring cleanup.

To save the GPT label after casting & validating ensuring there’s no partition overlaps it positions backup at end, computes CRC32, & writes segments to appropriate offsets.

Verification involves various structural & CRC32 checks, issuing warnings (via callbacks) to the commandline upon failure. Otherwise might output some basic info with a success message.

Creating a new GPT system involves allocating & initializing various properties, possibly sourcing structural metadata from the context or script with CRCs. Some in disk byteorder, some in CPU byteorder.


The locate-disklabel method branches over the given number to return a property name, offset, & size. The offset & possibly size are sourced from properties in-memory.

get-disklabel-item involves branching over the given item’s ID to populate its name, type, & data fields. The datafield again comes from properties in-memory in disk byteorder.

Setting a disklabel-ID involves parsing a GUID asking the user for it if not given, saving it with new CRCs, & reporting change to the commandline.


After a little validation getting a GPT partition involves an array lookup, & converting the resulting into a format independant structure serializing miscallaneous properties to text. Setting a partition performs the same array lookup to determine where to parse/copy the given data to. Simpler variants are offered.

Adding a partition works similarly to setting a partition, but with additional (slow) allocation logic & may request userinput for (or generate) missing data. Reports results.


Deleting an entry involves zeroing out the array entry, once we’ve validated its unused. Thus requiring new CRCs to be computed, & dirtiness is flagged.

Reordering partitions involves a check to see if its actually before running a standard qsort (which I believe is actually Merge Sort in GNU LibC). Again CRCs need to be recomputed & dirtiness flagged.

There’s a special unused-entry GUID used to check whether a partition’s empty & valid partitions need a start.


There’s method for toggling specified bitflags (determined via branching over given int) reporting to the commandline via a callback. Yet again CRCs need to be recomputed & dirtiness flagged.

There’s a simple finalizer. And a method to extract alginment info from the GPT structural metadata to the context object.

libmount

Another library underlying Util Linux’s disk-management commands is LibMount!

It has a handful of parsers for different bitmasks, namely debug flags & mount flags. There’s a simple parser for option strings, with mutation utils for the resulting array. And one for the mount TSV. There’s routines for reporting the library’s version & features.

There’s language bindings for Python.

There’s routine performing various checks to see if the given context is a loopback device.


There’s a routine for iterating over the mount table locating the loopback mount corresponding to given validated parameters. And a wrapper around it which extracts loopback mount options, mounts the loopback if missing via a loop context whilst validating the backing file.

There’s a couple routines for deleting such a loop device. These all build upon a more general shared library here offering an object mostly just wrapping /dev/loop* devicefiles.


There’s a context object (which those are treated as methods upon) consisting of various properties including mount tables, status codes, bitflags, subcommand, options, callbacks, filesystem objects, directory/file paths, etc. Has accessor routines.

There’s a linkedlist of mounts including routines for creating a new mount reusing options on an existing one. And a routine that preprocesses mountoptions possibly through that & similar routines, or SELinux. Further wrappers actually mount it.


libmount context table accessors parses in the table, configured as per other properties, if it hasn’t been already. A cache object further avoids reparsing the mounttable. The namespace accessor has various validating wrappers.

It has a routine for running the appropriate/configured helper routine, running the mount syscall, or a stack of wrappers that do both/either as appropriate, each layer handling additional options.

Can yield human-readable errors.


There’s routines for loading the mounttable & locating specific entries whilst switching namespaces. Has wrappers & similar routines. Some querying mount options.

The main & “u” mount tables have similar but separate routines around them.

There’s routines for parsing mount options into a context.

There’s a routine for reading BTRFS subvolumes via a special IOCTL.

There’s a key-value refcounted cache falling back to a libblkid cache for tags, as described previously.


There’s a filesystem object holding a linkedlist, source, bindsource, tagname & value, root path, swap behaviour, target, filesystem type, various option strings, attrs, a comment, bitflags, userdata, etc. These optionstrings can be merged.

There’s an iterator object over mountpoints.

There’s an object for generating lockfiles, ensuring signals are handled correctly.

There’s a “monitor” wrapping inotify syscalls with logic to handle lockfiles.


There’s a “table” object wrapping a linkedlist with a common “intro” & “tail” as well as comments (including intro & tail ones) & userdata. Can have an associated cache. Can be iterated over to locate particular filesystems, & can ofcourse mutate the list. Includes special BTRFS support.

There’s a tabdiff object holding a linkedlist upon new and/or old filesystems, & a list of unused filesystems. Can compute a diff between two tables.


There’s an update object holding a target path, filesystem, filename, mountflags, userspace-only flag, ready flag, & mountinfo table. Accessors often include heavy validation. Can serialize out to a standard file.

Finally there’s utilities it builds upon for:

libsmartcols

To improve the output of its commands Linux Utils implements a “libsmartcols”, for better aligning columns exposing more fine-grained controls.

Includes for debug bitmask & its own version number. Has internal routines for outputting human-readable debug output for column or columns.

There’s a wrapper around linewrapping, mbs_width, or mbs_safe_width with postprocessing. There’s a further wrapper adding table/tree traversal computing min & natural widths to compute column widths.

Another wrapper-layer iterates over the table’s columns using that routine to compute the size of each one, ensuring the total width is reasonable for the terminal size via extensive postprocessing with a choice of loops.


There’s a cell object holding data, colour, userdata, bitflags, & alignment with comparator & clone routines.

There’s a column object holding colour, headercell, min/max/avg/hint/final width, bitflags, JSON type, table reference, & comparison/wrap callbacks.

Column objects have hard line-wrapping routines.

There’s a “grouping” object wrapping a linkedlist multistring to yield a zip iterator over the lines of each cell in a row, thus lowering tabular linesplitting into something closer to what can be written out.

There’s a column iterator object.

There’s a line object wrapping an array of cells with arbitrary userdata, colour, & an optional tree structure, supporting column rearrangement & cloning.


There’s a symbol object holding a tree branch, vertical & right tree-links, vertical & horizontal groups, first/last/middle member, middle & last child, & title/cell padding.

There’s a table object holding an output filedescriptor, desired dimensions sourced from the terminal, desired output format, lines, columns, groups, symbols, a title cell, & flags. Has lots of accessor routines, & wrappers around its properties’ methods. Has sorting routines.

There’s treetraversal utilities.


Building upon all that the bulk of the logic is to output these objects to a tree utilizing the groupings & the sizing routines. The concept of symbols are used to serialize the tree structure into ASCII art.

With some lower-level utils…

To print an emptycell libsmartcols outputs any colour markers, considers outputting tree symbols, considers finalizing the line, otherwise filling with spaces before ending the colourspan & probably emitting a column separator.

There’s a routine for printing a treeline, handling JSON specially. And another to traverse the tree with that routine.

There’s routines to initialize & cleanup such printing.

Further wrappers ensures the table title & headers are included in those ranges, enforce validation, & allow printing to a string in-memory.

libuuid

Partitioning formats may incorporate UUIDs, so Linux Utils implements its own trivial libuuid module!

This has routines for zeroing-out 16bytes, parsing 2 UUIDs & comparing their components, copying one UUID to another, checking whether a UUID is zeroed, (de)serializing UUIDs, (un)parsing UUIDs, resolving known (“dns”, “url”, “oid”, “x500”, or “x.500”) namespaces. Or extracting the unpacked components (time, type, or varient) out of a UUID.


Most of the logic is in generating UUIDs.

There’s a few varients of UUID generation. You can take a MD5 or SHA1 hash & tweak its time components.

Or you can, possibly falling back to time generation, you can generate random data (via the getrandom syscall or reading /dev/[u]random scrambled with the time) a given number of times tweaking the time.

Or you can (“time generation”) consult a daemon via a socket, tweak the time, or retrieve the time alongside ensuring an “init” message is present, randomly generating it if needed.

Login Utilities

Util Linux has a suite of accounts utilities.

After initializing internationalization & parsing commandline flags nologin opens & fstats /etc/nologin.txt, & copies its contents to stdout if valid.

After initializing internationalization & parsing/validating commandline flags su & runuser checks whether we’re running as root, saves some globals in a mini-library, initializes the terminal as desired, reads a password from stdin without echo, retrieves old & new accounts, calls into PAM (a fairly verbose API), determines which subcommand to run, configures groups, configures limits, opens a PAM session, optionally creates a new pseudo-teletype devicefile, forks a parent process to cleanup from the subshell, optionally configures the new pseudo-teletype devicefile, makes the setgid/setuid syscalls, modifies the envvars & commits them to PAM, cleans up PAM, & runs the subshell!


After initializing internationalization & parsing -V/-h newgrp retrieves your username & calls setgid with an implicit or explicitly given value before restoring UID via setuid syscall & running a subshell.

After initializing internationalization & parsing -V/-h vipw & vipr (operates on different files), edits the main file, then edits the corresponding “shadow file” if the user accepts it & it exists.

To edit those files vipw/vipr clears some limits & signals, locks the file, opens it, creates a tempfile validating that doesn't exist, runs the configured editor, possibly reopens the editted tempfile, validates whether any changes have actually been made, configures UNIX permissions, backs up old file, & cleans up.

After initializing internationalization & parsing few commandline flags, utmpdump configures specified output & input, before choosing the dump or undump subcmd.

Undumping involves tokenizing a line & writing out the gathered binary data. Dumping involves reading in that binary data (possibly with some tailing logic) & serializing it back to text.


After initializing internationalization & parsing a few commandflags + single commandline arg chsh looks up the given user, possibly validates they’re in the /etc/passwd file, possibly performs some SELinux checks, retrieves the previously-set shell, validates we’re altering ourselves, calls into stdlibc to check whether the old shell is listed in /etc/shells, possibly checks with PAM, prompts if needed for a new shell (via libreadline if available) then validates, checks whether we actually need to change the shell, & actually applies the change via a shallow wrapper the external libuser library or semi-manually altering the passwords file via an intermediate file.

chfn works largely the same way except it formats & sets finger information rather than a valid shell. Which consists of multiple subfields to prompt for.


After initializing internationalization & parsing commandline flags last outputs latest logins by iterating over each of the given files defaulting to /var/log/btmp or wtmp. For each (handling fuzzing slightly specially) it retrieves the boot-time from the kernel, it opens the file with a configured buffersize, carefully reads the binary data fstating on failure, seeks to end, & iterates over the records in that file. That is repeatedlies reads & validates an entry, possibly & extensively reformats (including a DNS lookup) into human readable output, applies some normalization, tweaks locals and/or output additional info in a similar format, & clears state upon seeing shutdown loglines. After which it attempts to output a timestamp from the file, & it cleans up. Including freeing the filepath that was opened.

Upon processing user-process records it may open /proc/?/loginuid to determine if its a “phantom”.


After initializing internationalization alongside some collections & parsing commandline flags sulogin validates its running as superuser, ignores various signals, possibly ensures necessary device-filesystems are mounted (upon exit it’ll unmount any it mounted), retrieves the commandline arg or $CONSOLE envvar, retrieves possible console devices via /proc, /sys, /proc/cmdline, or an IOCTL with a fallback validating results are returned, reconnects the pipeline files, retrieves root account, & iterates over consoles. For each console sulogin attempts to open one which isn’t overloaded & initializing it to support Plymouth (a.k.a. splashscreens), various IOCTLs, locales, & baudrate.

If successful soft-ignoring SIGCHLD signals sulogin iterates over those opened console connections. After validation it forks a child which’ll repeatedly ask for a password if needed & run a subshell.

Trailing oops waits for subprocesses to end, closes opened consoles, & wait for subprocesses again. Between restoring signal handlers.


After initializing internationalization, configuring signal handlers & scheduling, & parsing commandline flags + singular arg login configures a process group & the terminal, opens the system logger, configures PAM possibly requesting authentication from it, loads the password entry, loads groups, opens a pam session, clears a watchdog timeout, closes the passwords database, logs some info, switches terminal info, configures environment variables, generates a new process title, logs some more, optionally outputs login messages, forks a subprocess with watchdog parent, drops privileges, cds to home, and runs the appropriate shell.

This command generates the btmp logs consulted by last. Can consult a “hushlogin” file via a mini-sharedlib.

And finally… After initializing internationalizing & parsing/validating commandline flags with the aid of libsmartcols checking if at most one arg remains lslogins parses wtmp or btmp files to reformat for libsmartcols. Includes optional systemd logd support, which I suspect mostly just serves to reinforce systemd’s poor reputation.