System Commands

There’s a large suite of commands split accross a handful of projects which do little more than call into the Linux kernel via “systemcalls”. This page documents how each of these work under the hood.

GNU CoreUtils

GNU CoreUtils provides the majority of the most-used commands on the commanline. GNU CoreUtils’ commands are largely mutually independant, with a tiny bit of shared code. There is some overlap with Bash’s builtin commands, in which case those take precedance.

All these commands, and more, that I’m describing can be compiled into a single coreutils executable with the entrypoint comparing argv[0] to all commandnames known to coreutils.h & calls the corresponding function. Otherwise parses some commandline longflags to determine the function to call, or perform typical initialization (with –help also listing those commandnames) to possibly output an error.

Yes, you don’t need to switch to BusyBox for this!

yes (after initializing internationalization, registering an exit callback to safely fclose standardout, & parsing common GNU commandline flags namely --help & --version; yes there’s a minor shared library, mostly for normalizing UNIXes) adds “y” to the commandline args if missing, computes memory size needed for space-concatenating (unparsing) those commandline args then allocates with a minsize (duplicating text if too small) & does so.

Then infinite loops writing that to stdout!

whoami after typical initialization & validating no additional args are given, wraps geteuid syscall (a getter for a property of the current process) & getpwuid.

uptime after typical initialization retrieves the optional commandline argument & reads that utmp file (later topic) before retrieving system’s uptime in seconds via kernel specific facilities (on Linux /proc/uptime) & manually converts that into a human-readable & localized value to write to printf.

unlink after typical initialization retrieves it’s single argument & wraps the unlink systemcall (defered to an OO-method on the appropriate filesystem).

unexpand, on more standard UNIXes & after mostly-typical initialization parsing commandline flags itself with no other commandline args, optionally wraps uname outputting specified fields of it space-seperated, & optionally makes a couple sysinfo calls to output CPU & hardware names.

tty, after mostly-typical initialization, parses -s, -h, & -v flags itself validating there’s no further commandline arguments either wraps isatty outputting via error code or ttyname outputting via stdout depending on presence of -s.

truncate, after initialization, parses & validates commandline flags, stats & opens the first (non-optional) commandline arg, & for each of the rest temporarily opens it, computes missing flags via fstat & maybe fseek before wrapping ftruncate.

true & false optionally runs the typical initialization if there’s only two arguments, with their main purpose being to immediately exit with a specific errorcode.

touch, after mostly-typical initialization, parses & validates commandline args possibly lstating a reference file or calling gettime syscall to fill in missing fields, optionally parses the given date, before iterating over commandline args wrapping fopen and optionally fdutimensat.

tee, after mostly-typical init, parses flags, disables specified signal handlers, configures binary for stdin & stdout with sequential access optimizations, opens all the args into a newly-allocated array, repeatedly reads blocks from stdin it to all of them, & once all outputs or the input closes it closes any remaining files.

sync, after mostly-typical init, parses & validates flags, for each arg temp-opens that file, fcntl(F_GETFL)s it, & wraps f[data]sync or syncfs.

sleep, after typical initialization, parses each commandline arg as doubles + units summing the results to pass to the xnanosleep syscall.

runcon (simple sandboxing), after mostly-typical initialization, repeatedly parses & validates flags, calls into SELinux to construct a new context possibly based on the current one, validates & saves that context, & execvps the trailing arguments.

rmdir wraps that syscall maybe ignoring certain errors & maybe iterating over parents.

realpath, after mostly-typical initialization, parses commandline flags validating there’s no further commandline args, normalizes the base path in different modes, & for each arg normalizes it, maybe finds common prefix with base to reformat into a relative path, & outputs it.

pwd, after mostly-typical initialization, parses commandline flags to determine whether to return normalized $PWD or getcwd with fallback concatenating all parent dirs from “.”.

readlink, after mostly-typical initialization, parses commandline flags validating argcount based on -n, & for each arg outputs the result of a readlink syscall wrapper reallocating larger buffer on fail or of a hashmap-aided common utility iterating over the link normalizing it whilst readlink any intermediate links realloc’ing as needed.

printenv, after mostly-typical initialization, parses commandline flags & iterates over libC’s environ to output all or specified envvars.

printf, after mostly-typical initialization, checks for –help or –version flags performing their special tasks, validates there’s at least 1 commandline argument, & interprets that first argument akin to C’s standard lib concatenating the given text with appropriate escaping, unescaping, & validation.

nproc, after mostly-typical initialization, parses commandline flags before checking OpenMP envvars before consulting traditional kernel-specific syscalls (now getaffinity’s standard).

nohup, after typical initialization, checks $POSIXLY_CORRECT to determine which error code to use, validates there are args, checks whether we’re running in the terminal, possibly NULLs out stdin, opens a new stdout/stderr, disables SIGHUP signals, & execvp’s it’s remaining args.

nice, after mostly-typical initialization, parses flags with special path, parses offset if given, wraps nice or setpriority, & execvps remaining args. If there are none another codepath outputs niceness.

mktemp, after mostly-typical initialization, parses it’s commandline flags, validates it’s commandline args possibly providing defaults, concatenates on a given suffix and/or containing directory, & calls gen_tempname_len with appropriate bitflags. Which GNU CoreUtils reimplements themselves in case it’s not available in LibC.

mknod, after mostly-typical initialization, parses & validates it’s commandline flags including SELinux/SMACK context, & wraps makedev + mknod or mkfifo.

mkfifo, after mostly-typical initialization, parses commandline flags including SELinux or SMACK context, possibly calls mode_compile/umask/mode_adjust, & for each commandline arg error-handled mkfifo possibly with preceding SELinux defaultcon.

logname, after typical initialization, validates there are no commandline args wraps getlogin & puts result.

link, after typical initialization, validates there’s at exactly 2 commandline args & wraps link syscall.

mkdir parses flags including SMACK/SELinux context, validates there are args, possibly calls mode_compile/adjust & umask, & for each arg temp-registers the security context whilst calling a thoroughly error-handled wrapper around mkdir syscall. With or without a callback which mkdirs any missing parents.

kill parses commandline flags to either kill syscall each arg as pid or serialize each arg as signal with or without number. Or same for all signals.

id, after mostly typical initialization, parses & validates commandline flags, & for each commandline arg parses it as a user+group (an int or two) to wrap getpwuid & print specified properties. If there were no commandline args it instead retrieves it’s own user/group to print the same specified properties.

hostname, after typical initialization, wraps sethostname if single commandline arg or gethostname if there’s none. Otherwise complains.

hostid wraps gethostid.

groups, after typical initialization, wraps getpwnam for each arg. With getpwuid, getgroups, and/or getgrgid calls to help output it’s result. Or if there were no args retrieves the process’s own uid, & (effective) gid to output the same way.

getlimits, after typical initialization, prints various numic constants from LibC.

echo, after mostly-standard initialization if –help or –version aren’t given, possibly parses flags and/or unescape each arg & outputs each in turn.

dirname, after mostly-typical initialization, parses flags & for each arg (complains if none) locates last “/” in the path & outputs up to that point.

date, after mostly-typical initialization, parses & validates flags largely into a format string with a default, computes input time via posixtime or gettime or stat+get_stat_mtime or parse_datetime2, optionally calls settime, & ultimately wraps fprintftime. Or if given -f parses & reformats each of that file’s lines.

chroot, after mostly-typical initialization, parses & validates commandline flags & retrieves ownership from stating reference file or parses the specified user before opening & normalizing all the specified files deciding whether to actual call chown & retrieving fstatat.possibly then does so with appropriate chown variant. Using a shared filesset abstraction to implement recursion, which I’ll study tomorrow.

chgrp works very similarly, using the exact same core function.

chmod, after mostly-typical initialization, parses & validates flags, parses the given mode (usually from first arg) manually parsing & reformatting octal ints or as text or copies this info from stating a reference file (also there’s normalization), uses that same FTS abstraction to implement optional recursion, & for each given file dequeued from FTS it handles (possibly outputting) any dequeued errors, with validation/normalization calls chmod, & possibly outputs info about it.

chroot, after mostly-typical initialization, parses & validates commandline flags, wraps chroot, parses & retrieves groups (as ints or via getgrnam) to also wrap setgroups sycall if available, & execvps remaining args.

And finally basename, after mostly-typical initialization, parses & validates commandline flags before iterating over each commandline arg looking for “/”s & maybe a given suffix to strip off the result it fputs.

Some of GNU CoreUtils’ commands for manipulating the filesystem are more interactive, allowing optional recursion, prompts, & error reporting. Namely cp, mv, & rm. ln & ls appears to be in a similar bucket. These are the topics of today’s study!

cp, after mostly-typical initialization, checks whether SELinux is enabled & initializes a context, parses & validates commandline flags into that context (& a global for e.g. backup suffixes), registers SELinux context, & with a global closed hashmap cp performs some further flags validation, & if the destination is a directory it might start populating that global hashmap & iterates over the remaining args to maybe strip off trailing slashes, maybe concatenate on a parent directory possibly creating it or preprocess away “..”, performs core logic if parent dir exists possibly followed by chown/chmod/SELinux syscalls. Or (after lowering a backup edgecase) directly calls this core logic!

That core logic for copying a file involves possibly considering (for mv not cp) calling renameat2 or equivalent available syscall, possibly carefully fstats to warn about non-recursively copying directories, uses the hashmap to validate args aren’t duplicated, if the move didn’t succeed or wasn’t performed might [l,f]stat it, maybe perform validation against copying a file to itself emitting an error upon failure, maybe checks timestamps, considers whether to abandon the op, warns about overriding a file with a directory or vice versa, performs checks against loosing data, maybe applies autobackup operations, and/or attempts unlinking the destination. Then cp/mv validates it won’t copy into a symlink it created in the process of copying, considers writing verbose output, updates or consults the hashmap, to perform yet more validation, for mv attempts to apply a rename op followed by optional SELinux updates, verbose/error output, and/or hashmap updates.

Then computes permissions whilst configuring SELinux state, if it’s copying a directory it performs more validation, possibly updates hashmap and/or verbose output, & co-recurses over a multistring collected from readdir syscall. If it’s a symlink carefully calls the symlinkat syscall. If it’s hardlink carefully calls the linkat syscall. If it’s a fifo calls mknod falling back to mkfifo. If it’s a device file calls mknod. There’s another link copying case. Then cleans up.

To copy a regular file cp/mv opens & fstats it with a bit more validation, opens the destination file configuring SELinux permissions & cleaning up on failure with a couple more alternate open paths, validates & fstats the destination, possibly attempts to use the clone syscall, or carefully reads from one file to write the data to the other whilst carefully avoiding/punching “holes” then copies permissions over.

mv works basically the same as cp, running the same logic in a different mode & incorporating a trailing call to rm’s core logic.

rm in turn, after mostly-typical initialization, initializes some parameters for it’s core logic then parses & validates flags into it possibly prompting the user (via a possibly localized shared utility function) with the argcount. Before calling the core logic shared with mv. Which incorporates the FTS utility for optional recursion’s sake.

For each file to be deleted involves checking the file’s type. If it’s a directory it might complain (especially if it’s “.”, “..”, or “/”) possibly flagging ancestor dirs before prompting & carefully unlinking it. If it’s a regular file, stat failure, symbolic link with or without target, postorder or unreadable directory, dangling symlink, etc possibly flags ancestor dirs, possibly prompts (gathering data to report), & unlinks the file. It’s skips cyclic directories & reports errors.

ln, after mostly-typical initialization, parses & validates commandline flags before extracting the target & destination filepaths from commandline args possibly creating & fstating it. Then ln initializes autobackup globals, considers initializing a deduplication hashmap, & with some preprocessing runs the core logic per argument. There’s a seperate codepath to this core logic for single arguments.

This core logic involves possibly attempting to call [sym]linkat before diagnosing errors whilst applying backups & tweaking filepaths to try again. If either suceeded it possibly updates the hashmap and/or verbose output. Or reports the failure undoing backup.

ls, after typical initialization, parses & validates it’s extensive commandline flags followed by $LS_COLORS/$COLORTERM in which case it disables tabs & performs postprocessing, if recursion is enabled initializes a hashmap & obstack, retrieves the timezone, initializes “dired” obstacks, possibly initializes a table for escaping URIs whilst retrieving hostname, mallocs a “cwd file”, clears various state, enqueues commandline args whilst lstating them & deciding how to render, dirs are enqueued in a linked list, results are optionally mpsorted then seperates out dirs to be enqueued, the current batch of files are printed to stdout in a selection of formats, then dequeues each directory skipping cycles & repeating similar logic, then cleans up after colours, dired, and/or loop detection if any of those were used.

ls reimplements a tiny subset of ncurses (& lookuptables from e.g. filetypes to colours) for the sake of columnized & colourized output.

To aid implement optional recursion in rm, chown, chmod, etc GNU CoreUtils implements a “FTS” utility.

To open a FTS filesystem traversal it validates the arguments, callocs some memory whilst saving some properties, might test-open “.”, computes the maximum length of it’s arguments to allocate memory to store any of them in, allocs a parent FTS entry & one for each commandline argument referencing it possibly qsorting it, & allocs current entry.

To dequeue the next entry from FTS it performs some validation, considers re-yielding & re-lstating the previous entry for certain kernel errors, considers calling diropen syscall for recursion’s sake. If the current entry’s a directory it may close it if instructed to by caller, possibly clears it’s fts_child property, possibly calls diropen whilst carefully avoiding “..” whilst updating a ringbuffer and/or the system current director, & traverses to the directory’s child.

Then iterates to next child by calling dirfd or opendirat+lstat whilst handling errors, decides whether to descend into directories, initializing some vars, & before cleanup/sorting repeatedly calls readdir, allocing/initializing memory to hold the new entry & it’s filepath carefully handling errors,lstats the file for more fields to store, & inserts into a linked list.

If it found a next entry it gets validated & tweaked as instructed by caller whilst recalling lstat.

Once all the entries in a directory have been traversed it follows the parent pointer freeing previous memory, & validates/tweaks it before yielding that virtual entry.

This tree traversal may be augmented with a hashmap to detect cycles.

I don’t see much use of that ringbuffer…

Beyond exposing language bindings for LibC’s/Linux’s syscalls the other dominant task GNU CoreUtils’ commands perform is to perform textual transformations or summaries of stdin. The simpler cases of which I’ll describe today!

uniq, after mostly-typical initialization, parses & validates commandline flags whilst gathering an array of 2 filepaths. Which if not “-“ will be freopened over stdin & stdout with sequential access & (internal util) linebuffering optimizations enabled.

The fastpath reads each line from input stream (enlarging buffer until finds the configured delimiter, defaults to newline), skips configured number of whitespace-seperated fields & then chars, compares against previous row case-sensitively or not, & depending on grouping mode outputs a delimiter and/or outputs the line whilst updating state.

The slowpath also tracks an error-checked count of repeated lines & whether we’ve seen our first delimiter, moving a line read & write out of the loop.

unexpand converts spaces back to tabs by, after mostly-typical initialization & parsing & normalizing commandline flags into e.g. a tabstop array & temp-allocated filename list both via a shared util with expand, pops & fopens (with sequential optimizations) the first file off that array, mallocs a blank column, & repeatedlys reads the next char popping the next file upon EOF, looks up appropriate tabstop from array upon blank chars stopping future conversions if it goes beyond the end, validates the line wasn’t too long, replaces the whitespace with a tab char if it was already one or we’ve changed tabstops, decrements column upon \b recalculating tabstops, & otherwise increments column. If it has prepared pending whitespace to write it’ll finalize & output it. Then output the non-whitespace char.

Repeating until end-of-line (innerloop) & end-of-files (outerloop).

tac, after mostly-typical initialization & commandline flags parsing including regexp-parsing the “sentinal” & bytesize validation, finds the remaining commandline args default to “-“, configures binary output mode, before flushing remaining output & cleaning up for each arg it opens it in binary mode (handling “-“ specially), lseeks to end, handles seekable & non-seekable files differently, & cleans up.

For nonseekable files it copies over to a seekable file before tacing it.

For seekable files, or after converting non-seekable files, it normalizes the computed seek offset to be multiple of a precomputed read_size lseeking there, before lseeking back then forward a page at a time looking for EOF, & repeatedlies runs the configured regex to find the configured line seperator OR performs a simpler fixed-size string-in-string search, if it didn’t find a match at filestart it outputs a line & exits. Or it reads from line start into newly realloced memory.

If it found a match it outputs that line with or without trailing line seperator, updating past_end & maybe match_start properties.

There’s also an in-memory codepath I don’t see used.

paste, after mostly-typical initialization & parsing commandline flags, defaults args to “-“ escaping those filepaths, runs serial or parallel core logic before cleaning up based on one of those flags.

That serial logic involves opening each file (handling “-“ specially) with sequential optimization.

After opening each file checking for empties, it then copies individual chars from input to output replacing any line delims & adding a trailing one if needed.

The parallel logic involves opening each file validating stdin handling, then repeatedly iterates over each file considering outputting extra delims from a preprepared buffer before repeatedly copying chars from input to output.

Or if the file was already closed it considers which state it needs to update or delims to output.

nl, after mostly-typical initialization & parsing flags, prepares several buffers, processes each file (defaulting to just “-“ a.k.a. stdin) & maybe fcloseing stdin, & returns whether all of those files were successful.

Processing a file involves fopening it (handling “-“ specially) with sequential optimizations, reading each line determining via memcmp whether we’re in a header, body, footer, or (where the real logic/incrementing+output happens) text. Resets counter for non-text.

join, after mostly-typical initialization, registers to free an array on exit, parses & validates commandline flags, gathers an array of filenames whilst determining which join fields to use, & with the two files open handling “-“ specially it runs the core logic.

This core logic consists of enabling sequential read optimizations, initializes state for both of the input files populated with their first line, maybe updates some autocounts, maybe runs an initial join, & repeatedlies…

For each pair of lines join memcmps the appropriate field case-sensitively or not, might output a linked list of fields or just the fields being joined from file at lower key whilst advancing it to next line, advances leftfile until no longer equal then same for rightfile, maybe outputs those lines, & updates each file’s state whilst checking for EOF.

Trailing lines from either file are possibly printed after this loop & memory is cleaned up.

Fields are split upon reading each line.

head, after mostly-typical initialization, parses & validates commandline flags possibly with special handling for integral flags, defaults remaining args to “-“, & in binary output mode iterates over all those args for each temporarily opens the filepath (handling “-“ specially) optionally outputs a filepath header surrounded by fat arrows & uses different core logic for whether we’re operating in terms of lines or bytes & whether we’re outputting a finite number of them.

For fixed number of bytes head copies a bufferful of data at a time until we’ve met that target.

For fixed number of lines head copies a bufferful of data at a time whilst counting newlines, until we’ve line count. Or rather decrements the linecount until 0.

To output all but the last n bytes head, if it can query the filesize, copies a computed number of bytes a bufferful at a time. Or in 1 of 2 ways copies a buffer of computed size at a time, chopping off n bytes once it reaches EOF.

To output all but last n lines on a seekable file head reads it backwards a bufferfile at a time counting newlines until it found the stoppoint in bytes. Then copies a bufferful at a time until it reaches that point.

To output all but last n lines on a pipe head allocs a linked list & repeatedly reads a bufferful at a time maybe immediately outputting the line if we’re not eliding anything, counts newlines in that buffer, & considers merging buffers or outputting the old head.

To wrap text to a fixed width fold, after mostly-typical initialization & flags parsing, iterates over every arg falling back to “-“. For each it temporarily opens the file (handling “-“ specially) with sequential optimizations & reads a char at a time adding them to a buffer.

Upon newlines it writes the buffered text. Otherwise computes new column handling \b, \r, & \t specially. If overflows given width it might locate last buffered whitespace to output until, or outputs full buffer.

fmt, after mostly-typical initialization & parsing flags handling digits specially, iterates over & fopens the args fallingback to stdin as “-“. For each it enable sequential optimizations, followed by a configured prefix, handles preceding blank lines then optionally reads of rest paragraph collapsing solo-newlines. For each such paragraph it performs split-costed linewrapping & outputs them in a seperate pass. Then tidies up errors after the loop.

fmt’s a more sophisticated fold!

To replace tabs with spaces expand, after mostly-typical initialization & parsing commandline flags, finalizes tabstops & saves an array of commandline arguments fallingback to “-“ then dequeues on, & before possibly cleaning up reading stdin repeatedlies: read each char from each file in turn, upon tab looks up the corresponding tabstop & outputs the appropriate number of spaces, decrements column upon \b, or increments column, then outputs the reachar (except tabs translated to spaces).

cut, after mostly-typical initialization & parsing & validating flags including field selection, iterates over all remaining args falling back to “-“, & cleans up after parsed fields & reading stdin. For each it temporarily fopens the file (handling “-“ specially) with sequential optimizations, handling byte & field counts differently.

For byte cuts it counts non-delimiter chars locating appropriate cut entries to determine when to output the delimiter, whilst copying all chars out unless the current cut entry indicates otherwise.

Field cuts works essentially the same, except reading entire fields split by configurable delimiters instead of individual chars.

These cut entries are tracked in an array with high & low bounds.

csplit, after mostly-typical initialization & parsing & validating commandline flags, validates there are remaining commandline args, reopens the given file over stdin, parses given regexps, registers a signals handler, iterates over & applies the given “controls” before carefully temporarily opening an output file to write all buffered lines to. This output file might also be opened when processing any of those controls.

For regexp controls at an offset repeatedlies looks up a line (upon failure to find this it either outputs the rest of the file or reports to stderr) before evaluating the regexp over that line to determine whether to output it.

For non-offset regexp controls it does basically the same logic but slightly simpler.

For linecount controls it creates the output file, reports errors, repeatedly removes queues lines to save to the file until reaching the desired linecount, & tidies up.

Upon dequeueing a line csplit considers whether it needs to read a new bufferful of data & split it into lines.

comm, after mostly-typical initialization & parsing flags, validates args count before running it’s core logic. Which involves fopening each specified file (handling “-“ specially) with sequential optimizations, perfile data allocated, & the firstline read in. Then mergesorts the lines from both files into stdout with or without collation, closes those files, & optionally outputs intersection/exclusion counts.

And last but not least cat, after mostly-typical initialization & flags parsing, fstats stdout to help determine most optimal codepath & maybe sets it to binary mode.

Then cat iterates over it’s args fopening & fstating each one, retring the optimal blocksize, validates it’s not cating a file to itself, if various aren’t set it’ll simply repeatedly copy data from input to output an optimally sized & memaligned buffer at a time OR with plenty of microoptimization it iterates over the buffer reading more as-needed looking for newlines possibly inserting linenumbers and/or escaping lines between them.

Before writing remaining text & cleaning up.

GNU CoreUtils provides several useful commands for rearranging & summarising or rearranging text files!

wc, after mostly-typical initialization, retrieves optimal buffersize, configures line buffering mode, checks $POSIXLY_CORRECT, parses & normalizes flags indicating which counts it should output, possibly opens & fstats the file those specified listing other files to summarize OR consults args, possibly fstats all input files again to estimate the width of the eventual counts, iterates over & validates all files whether listed in a file or args, running the core logic for each or reads from stdin, & tidies-up whilst outputting desired counts.

This core logic involves temp-opening the file again handling “-“ specially, possibly enables sequential optimization, possibly tries fstating again seeks near the end in case the size is approximate & uses repeated read for exact size.

Or the core logic may involve considering whether we can use AVX2-specific microoptimizations before repeatedly reading a bufferful in counting bytes & newlines. AVX2 allows x86 CPUs to do this in 256bit (or rather wc uses 2 of them for 512bit chunks) chunks by summing equality results.

Or it reads a bufferful at a time whilst decoding UTF8 chars (with ASCII fastpath) via mbrtowc handling \n, \r, \f, \t, space, \v, & kanji specially. Or maybe it’s compiled to not support Unicode.

After counting lines, words, chars, and/or bytes in each file it outputs those numbers before adding to the total sums across all files in case we want those values too.

tr, after mostly-typical initialization & flag parsing possibly altering locally-configured locale, validates args count, initializes a linkedlist & escapes then parses a regex-like pattern (or two) via an internal scanner, validates them, switches to binary input mode with sequential optimizations, & under various differing conditions consults a rename table and/or a couple smallintset (both compiled from parsed input pattern) to determine which input chars to output.

tail, after typical initialization & parsing & validating commandline flags obsolete first, defaults remaining commandline args to “-“, locates/validates/warns about “-“ in those args, shortcircuits if certain flags are all unset, allocs an array, considers whether to output headers, enables binary output mode, iterates over all those args performing the core logic, then given -f goes back to output any additional lines writing to those files ideally using INotify syscalls & hashmap to interpret it’s responses.

The core logic involves temporarily opening the specified file in binary mode (handling “-“ specially) & if successful possibly outputs the filename surrounded by fat arrows, locates the last n bytes or lines to write, & given -f validates & populates properties for the above -f loop.

To output all but the first n bytes tail consults fstat beforelseeking & copying bufferfuls from input to output before outputting any remaining text with without headers.

To output the last n bytes tail lseeks to the end of the file less n, decides whether it needs to apply pipe logic or seek back to start, & refines the seek position before outputting any remaining text with or without headers.

For pipes it buffers into a segmented linkedlist until EOF than outputs said buffer.

To output all but the first n lines tail fstats the file, reads bufferfuls counting (or rather decrementing) newlines until it reaches desired count, possibly outputs remainder of that buffer, & outputs the remainder of the file.

To output the last n lines tail fstats the file, tries seeking to end, reading the file backwards a bufferful at a time counting newlines until reaches desired count, outputs the buffer from that point followed by the remainder of the file.

To output the last n lines of a pipe tail reads the pipe into a linkedlist of buffers counting the number of newlines in each until EOF, uses those counts to locate the start of those lines, & outputs them before cleaning up their memory.

split, after mostly typical initialization & parsing & validating commandline flags , extracts & validates commandline args, freopens the specified input file over stdin, enables binary input mode, fstats the input to get optimal blocksize, mallocs a memaligned buffer, performs some trail reads to get the filesize, if in filtering mode registers SIGPIPE to be ignored, & decides whether it wants to apply the logic for digits/lines, bytes, byteslines, chunkbytes, chunklines, or RR.

For linesplits it reads input in bufferfuls counting newlines to determine which output file to write to.

For bytesplits it reads bufferfuls tracking a bytes countdown to determine which output to write to.

To split into lines of a maximum bytesize split reads buffers of the specified size counting newlines within them to determine which output to write that buffer to. Or splits buffer at a line break.

To split bytes chunks it may either be equivalents by bytes or it seeks to a computed start index copying buffers of the specified size to output.

Or there’s a variant which avoids splitting lines.

Or a variant that cycles between the array of output files in a “round-robin” fashion.

sort, after initializing locale as per usual, loads some envvars, further initializes locale, generates some lookuptables, alters signal handlers (mostly to ignore, resets SIGCHLD handler away from parent’s), registers to cleanup tempfiles & close stdout on exit, allocs/clears some configuration, parses extensive commandline flags with various conditional tweaks into those structures & other localvars, opens & tokenizes file specified by –files0-from flag if present, propagates various properties of fields to compare by whilst determining whether any requires randomness, ensures there’s at least one entry in that linkedlist before validating it, maybe outputs debugging information, initializes randomness if required via getrandom via a wrapper reading from a file instead for debugging purposes, sets a default tmpdir of $TMPDIR or /tmp, defaults remaining args to “-“, normalizes the ammount of RAM to use, optionally extensively validates instead, validates it can read all the inputs & write all the outputs, & with minimal tidyup commences bigishdata sort! (Might be overengineered for modern hardware…)

You can configure sort to only use it’s disk-based mergesort assuming given input files are already sorted. This involves an initial loop which merges each pair (or whatever) of input files (by advancing whichever’s head, parsing the line into fields for comparison, is lower copying to output) then each pair of those.

Or when compute’s the bottleneck not RAM (which RAM was on early computers) retrieve CPU core count as max pthreads count & for each bufferful of input from each file in turn temp initializes a mutex-locked priority queue & merge tree node (possibly in a new pthread) to apply an in-RAM mergesort with it’s chunks prioritized via the priority queue. Once these sorted arrays get to large they’re written to disk for the disk-based mergesort.

The comparator either interprets the relevant commandline flags parsed into an array selecting certain fields from the pre-lexed line applying a choice of comparison logics (includes randomly-salted MD5 for shuffling). Or it uses possible collated memcmp.

shuf, after mostly-typical initialization & parsing & validating commandline flags, gathers inputs whether empty, echoed from commandline args, numeric range, or specified files, initializes random number generator, maybe populates an array listing the new random index for each line in the file taking great care to preserve randomness distribution, considers closeing stdin, maybe computes the array via a sparse hashmap & randomly swapping indices, writes the randomly choosen indices or lines at those indices or the “reservoir”, & tidies up.

seq, after mostly-typical initialization & pasing/validating commandline flags, determines whether it can use a fastpath.

The fastpath keeps the number in text form incrementing chars carrying at ‘9’, populating a bufferful before outputting them. Uses memcmp to decide when to end.

Otherwise it parses & validates given floats whilst checking whether it’s actually an int & computing output width, reconsiders fastpath, generates a formatstring in lack of -f, & outputs each multiple of given step (carefully avoiding precision loss) up to limit printfing each adding seperators & terminator.

ptx, after mostly-typical initialization, calls setchrclass(NULL) if that syscall’s available, parses flags, gathers an array of filenames with linecount & text buffer sidetables whether or not args are given or GNU are left enabled, chooses a default output format, compiles given regexp whilst compiling a char rewrite table, loads all chars from a file into that rewrite table as “breaks”, loads a second sorted sidetable of “break words” from a given file, initializes some counts, for each given input file read it all into memory run the core logic & update line counts, sorts results, applies various normalization, & iterates over these results to compute charwidth of it’s fields & output in a choice of syntax including TeX.

That core logic involves iterating over the file’s text probably running the regexp to locate the start index for next iteration, & repeatedlies locates next word start & end via regexp or scanning over chars present in rewrite table skipping empty words, updates max length & counts, binary searches sidetable of allow & block sorted wordlists skipping words as dictated by them, possibly allocs an occurs_table entry & populates it partially as directed by caller, & possibly skips trailing chars than whitespace.

pr (which reformats text for printing), after mostly-typical initialization & pasing & validating/normalizing commandline flags, copies trailing commandline args into a new array, iterates over all those filenames defaulting to stdin or tells the core logic to render them in parallel, & tidies up.

That core logic involves computing various layout parameters opens each file being laidout in parallel (handling “-“ specially) with sequential read optimizations whilst layout pageheader text, possibly allocs memory to store columns, partially laysout a given number of pages to skip them using that parameter as the initial page number, computes some per-column layout parameters whilst choosing per-column callbacks representing whether to directly output text or buffer it in columns, then renders each subsequent page.

For each output page it resets some layout parameters, validates there’s text to layout, repeatedly outputs lines, updates flags, resets state, & outputs padding.

Laying out a line involves iterating over cols calling their callback (possibly skipping the rest of the input’s line) until it has no more lines to output, in parallel mode ensures columns remain aligned even when empty, & considers adding newlines.

Whilst laying out columns per-page it reads in the first line for each of them & reshuffles lines between columns to keep them balanced.

That callback applies text alignment, line numbers, textwrapping, etc & buffers text via the other callback.

od, after mostly-typical initialization & initializing a couple lookup tables, parses, validates, & normalizes commandline flags, choose a “modern” or “traditional” for extracting commandline arguments, possibly into a printf-string & read bytesize, defaults commandline args to “-“, opens the first file of those, carefully skips a specified number of bytes, computes the least common multiple (shared util) between all given readwidths & uses that to compute number of bytes per block, computes necessary padding to align output, in some builds outputs debugging info, & runs one of two variants of the core logic before possibly attempting to close stdin.

If we’re “dumping strings” from the file repeatedly it keeps reading bytes looking for at least a given number of ASCII chars loading them into a buffer, then reads until it’s found the nil terminator resizing the buffer as necessary, then outputs the address via configured callback & escapes/outputs the found string.

Otherwise od reads bufferfuls at a time with or without (two mainloops) checking against end offset, outputting last byte specially consulting computed lowest-common-multiple, & outputs end address.

Upon reading bufferful of data it considers closing current on EOF & opening the next one. To write that block it first compares against previous block. If they were equal it’ll output atmost “*\n”. Or output address followed by each specifier’s callback possibly followed by hex format.

digest, after typical initialization & enabling output buffering, parses & verifies flags, defaults args to “-“, & iterates over each arg either checking the hash or computing the hash & outputting it via the caller-specified callback.

To check a file’s hash digest opens the file (handling “-“ specially) repeatedly reads a line & strips it, extracts the hash, whether it’s a binary file, & filepath, runs core logic, compares to expected value, & decides which results to output.

The shared library for computing CRCs (not a really a hashfunction, but works well for detecting transmission errors!) embeds via the C preprocessor a script to generate it’s own lookuptable headerfile using bittwiddling & 2 intermediary tables. Has seperate codepath specifically for making optimal use of x86/x64 CPUs to take advantage of their pclmul instructions.

CRCs at it’s core involves repeated bitwise shift & XOR.

Finally base## (e.g. base64) commands, after mostly-typical initialization & parsing flags possibly (depending on build flags) including a seperately option for desired base, validates it has at most one argument defaulting to “-“ which it then temporarily fopens (handling “-“ specially) with sequential read optimizations, & either decodes or encodes the data as specified by -d.

For decoding it mallocs some buffers, reads a fullbuffer in, & calls the appropriate decode function.

Encoding works basically the same way but possibly with added text wrapper.

The logic (which is in a library shared within GNU CoreUtils) for encoding & decoding base32 or base64 text involves bitshits, bitmasks, & character lookup tables. There’s wrappers around this, as well as similar code written inline with the command, tweaking the behaviour for additional basenc options.

Not every system call GNU CoreUtils exposes to the commandline is very text centric.

dd, after typical initialization & configuring signal handlers, retrieves the system’s optimal buffersize, initialization a translation table with sequentially-incrementing numbers (identity transform), decodes keyword args in a different syntax from usual into e.g. filenames & bitflags & ints, updates that transform table, opens the if= given input file with specified flags checking whether it’s seekable, opens the of= given output file with specified flags possibly iftrancateing it use ifstat to diagnose failures, retrieves the current time via whatever highprecision syscall is available, & runs the core logic before diagnosing errors, ensuring any signals have been handled, cleaning up, & outputting final status.

dd’s core logic involves maybe fstating & lseeking the given offset fallingback to read twice (the second time outputting zeroes in their place for “seek” options), allocs input & output buffers, possibly retrieves current time to determine whether to outputs status, stops if we’ve copied enough records, maybe zeroes the input buffer, reads a full or possibly partial (depending on if keyword arg) buffer of conditional size whilst handling signals & warning about errors, updates counters possibly clearing output cache or possibly lseeks past bad blocks whilst invalidating cache maybe ending this loop, possibly zeroes the input buffer’s tail, maybe takes a fastpath outputting that input buffer immediately, maybe renames all the bytes according to the translation table, maybe swaps every two bytes, & lseeks & writes that postprocessed buffere either in full or a char at a time whilst tracking columns.

After dd’s mainloop it outputs the final byte if present, maybe pads with spaces, maybe adds a final newline, outputs last block if necessary, if the final op was a seek fstats, lseeks, & ftruncates the file, & f[data]syncs the file whilst handling signals.

Signals are handled several of these syscalls. Clearing output caches involves some throttling & posix_fadviseing.

Some standard translation tables are bundled for e.g. EBCDIC.

df, after mostly-typical initialization & parsing & validating flags including resolving filesize units filling in certain missing ones from envvars, test-opening each file whilst stating it, parses the syscall or device file listing currently mounted filesystems complaining upon error, maybe ensures all data is syncd so we can analyze it, allocs fields to hold each row data & outputs a header, gathers desired entries, probably outputs them ideally nicely aligned, & cleans up.

Gathering desired entries may involve for each commandline arg iterating over the mount linkedlist looking for the specified device whilst canonicalizing filepaths & stating the files looking for closest match, then reformatting format into text adding filesize units back in whilst calling which ever stat[v]fs-variant syscall is available. Or complaining if that device has been “eclipsed”.

Then another couple iterations where the device is used.

Or it might iterate over the mountlist deduplicating it via a temporary hashmap & filtering the entries as specified by the parsed commandline args whilst stating them, populating a templinkedlist before possibly copying it over, then iterates over the now-filtered mountlist to populate the table as per before.

In populating the table it increases some counts to also possibly output.

The alignment code is quite sophisticated, & in a internally-shared library.

du, after mostly-typical initialization & parsing & validating commandline flags & $DU_BLOCK_SIZE & maybe $TIME_STYLE envvars, determines where to read the argument list from possibly freopening the file given by –files0-from over stdin, mallocing a hashset of device-inode pairs, tweaks some bitflags, repeatedlies retrieves & validates the next specified file to apply the core logic to, tidies up all the allocated memory, & possibly prints the total count.

du’s core logic involves reusing the filetree traversal used by chown, chgrp, rm, etc.

Upon anything other than errors (which it reports) or entering directories it checks whether the commandline flags specified to exclude the file. If not it configures NSOK entries to be revisited & precedes to the next one validating it’s not an error & reconsidering whether to exclude. If so it tells the traversal to skip this entry.

For dirs it does no further processing & errors are reported.

Then per-file du gathers a local struct from the provided stat info, callocs or reallocs to form a tree out of this data, adds to appropriate counters, & maybe outputs the filesize, maybe date, & label.

There’s cycle detection logic in traversing directories referring to a lazily-loaded mounttable & the device+inode hashset.

env, after mostly-typical initialization & initializing a signals table, parses & validates commandline flags & validates it’s not receive the = op, resets all signal handlers to either default or ignore, maybe configures a signal mask, & maybe outputs these new signal handlers, maybe switches to a new current directory possibly with debug/error messages, maybe outputs which command it’s about to run, & execvps it tidying up on failure.

There’s a shared util func for traversing up a filepath to until before it changes st_dev/st_ino.

install, after mostly-typical initialization& parsing & validating commandline flags largely into a struct but also globals like (from a shared util mentioned previously) backup suffixes, validates there’s at least one additional commandline arg, further parses a couple flags, & either with a preserved current working directory & SELinux context creates the specified directory with any missing parents.

Or with a global hashmap (and with or without creating any missing parent dirs) stats the file, if successful copies the file over into new location as per cp if needed, if successful maybe runs the strip command over it, copies timestamps over via utimens syscall, & copies permission (both traditional UNIX & SELinux) permission attributes over.

Or it prepends a directorypath first before doing that.

pinky starts by mostly-typical initialization & parsing commandline flags.

In short mode pinky reads the specified UTmp file, determines which datetime format to use, outputs a heading line, & iterates over that UTmp file looking for user processes possibly filtered by a provide arrayset, stats the listed file & consults LibC’s flatfile databases to determine what text to output with.

In longmode it iterates over the commandline args, consults pwnam for more extensive info to output, followed by the user’s ~/.project & ~/.plan files.

There’s shared utils for manipulating SELinux devicefiles. There’s other shared utils for parsing field references from commandline flags. And another involved in quite sophisticated commandline flags parsing.

In storing data in magnetic fields harddisks left residual traces of deleted data, so it can be useful to repeatedly overwrite it with whitenoise to ensure that data is truly gone. Solidstate drives I believe don’t have the same issue, and using these “secure erase” tools just serves to shorten their lifespans. GNU CoreUtils provides shred for this.

shred, after mostly-typical initialization & parsing commandline flags, validates there’s additional commandline args, initializes a random-number generator & registers for it to be cleaned up on exit, & iterates it’s commandline arguments (handling “-“ specially, jumping near-straight to the core logic) temp-opening & maybe chmoding if necessary to apply the core logic before repeatedly renaming the file to progressively shorter names & unlinking it whilst syncing each step.

shred’s core logic fstats the file validating result, computes optimal buffersize whilst retrieving exact filesize, populates the buffer with random data with various counters possibly reusing previous chunks of randomness, & go over buffer again to improve randomness slightly, & repeatedlies seeks back to the start of each block a given number of times, possibly bittwiddles the buffer, outputs status, & repeatedly verified-writes random segments of the buffer to the file being shredded.

After the innermost loop shred outputs status info & syncs to disk so it actually has an effect.

stat, after mostly-typical initialization & parsing flags, validates there’s remaining args & (filling in dynamically-generated defaults) it’s -c/–printf flag, & for each commandline arg calls fstat or statfs or available variant syscall, before manually interpreting (lots of options) the given format string to generate output text whilst maybe locating mountpoint or SELinux context.

stdbuf, after mostly-typical initialization & parsing commandline flags, validates there’s additional commandline args, sets specified envvars, extracts the directory containing this command possibly referring to /proc/self/exe symlink, configures LD_PRELOAD envvars to “” whereever that is, & execvps the remaining args. libstdbuf in turn adds a little code to the executable(s) which parses those envvars to pass to setvbuf syscall.

stty, after mostly-typical initialization & parsing & thoroughly validating commandline flags, possibly reopens the specified file over stdin turning off blocking mode, retrieves the input’s mode, possibly parses $COLUMNS whilst outputting the specified subset of hardcoded controlchars. Or iterates over all specified settings (amongst other IOCTLs) encoding a new mode to subsequently pass to tcsetattr & reports if tcgetattr yields anything different.

Includes lightweight text wrapping.

test/[, after mostly-typical initialization, specifically checks for sole –help & –version flags on [, validates there are args, & runs an immediately-evaluated pushdown parser with a scanner but no lexer beyond the caller splitting commandline args. Whose leaves call various syscalls, typically stat variants to retrieve/compare different returned properties.

timeout, after mostly-typical initialization & parsing flags, validates there’s at least 2 args remaining, parses the next commandline arg as a timeout duration with an optional unit, maybe calls setpgid(0, 0) so all subprocesses are killed with timeout, configures signal handlers, & forks execvping the remaining commandline args in the child process with reset SIGTTIN & SIGTTOU signal handlers, in the parent process ensures we receive SIGALRM signals calls the appropriate available syscall to schedule it’s triggering, blocks several signals, & waits for child process to end.

Once the child process has ended (or a signal was received) it checks a flag set by the SIGALRM callback & kills self without coredumps.

Various signals including SIGALRM considers killing the child process or resetting a second timeout.

users, after typical initialization, parses the given UTmp (or default) file, iterates over every userprocess therein extracting names to qsort then output & deallocate.

And finally for today who, after mostly-typical initialization & parsing & validating commandline flags into various bools, selects a time format with max charsize to allocate to interpreting it, decides how to behave based on commandline args count, temporarily-parses the given or typically UTmp file, & decides how to handle it by presence of -q.

If it’s present it iterates over all user processes, extracts & outputs the trimmed name, & outputs how many of those entries it counted.

Otherwise considers outputting a heading line, considers calling ttyname for data to filter entries to only list ourself, & for each entry in that file to output an (if enabled for it’s type) appropriate line for it’s type. This list bit gets fairly involved yet tedious, shares an internal utility for outputting tablelines & formatting times/periods.

tsort, after typical initialization & validating there’s at most one arg defaulting to “-“, mallocs a root node, freopens the specified file over stdin if not “-“ enabling sequential read optimizations & initializing a multistring tokenizer, for each token it locates where to place it in the tree whilst balancing & inserts a new treenode there, validates there was an even number of tokens, counts all treenodes, & computes output from it.

To compute output from it’s binary tree tsort gathers a linkedlist of binary tree nodes with no dependencies, outputs each of their strings whilst removing them from the binary tree & decrements the counts on their dependencies to determine which to add to this linked list. If there’s tree nodes left after this that indicates there’s a loop in which case it iterates over the tree to find & output these loops, removing an edge to break the cycle so it can try again.

pathchk, after mostly-typical initialization & parsing commandline flags to global bools to determine checks are performed, validates there’s at least one additional commandline arg, & iterates over them. For each it maybe checks if there’s a leading hyphen (early UNIX bug treated all of those as stdin), maybe checks if the filename is empty, maybe checks whether the filename is pure ASCII (excluding symbols & whitespace) or checks whether the file exists via lstat, partially based on that it might check the charlength of the filepath, it might check the charlength of each path component in 2 (fast & slow) passes. Appropriate error messages for any of these failing checks are written to stderr.

Though I sure hope no modern system requires these portability checks!

numfmt, after mostly-typical initialization whilst maybe setting the CPU’s floating point precision & retrieving default decimal point from locale, parses & validates flags including a printf-like format string, reallocs a buffer according to configured padding, & iterates over commandline args (with possible warning in presence of –header) OR stdin’s lines (the first given number of lines of which of which are treated as a header).

For each (surrounded by delimiters) iterates over the line’s fields removes specified suffixes & whitespace, maybe reconfigures the padding buffer based input charlength, carefully parses the number more leniently & resiliently, computes charsize to validate it’s not too large, reassembles the printf format string whilst applying a choice of rounding to parsed number to reconsider whether to show the decimal point, applies that format string, possibly adds a suffix & applies alignment via mbsalign, & outputs that formatted number with appropriate prefix & suffix. Or outputs raw input text.

expr, after typical initialization & validating there’s non-“–” commandline args, and runs an immediately-evaluated pushdown parser with a scanner over the commandline args operating upon a tagged enum holding either multiprecision integers (mpz_t) or multibyte strings. Also can be tested if it’s falsy or evaluate regexps upon the : string infix operator.

Results are converted into a textual output and a boolean errorcode.

dircolors, after mostly-typical initialization & parsing & validating flags, either outputs some pre-compiled hardcoded text, OR guesses which shell is used based on $SHELL if not explicitly stated via commandline flags before reformatting the input file or stdin surrounded by shell-specific prefix/suffix text.

Parsing/reformatting the specified input streams involves possibly temporarily-freopening the specified file over stdin if not “-“, retrieves $TERM, & repeatedlies with linecounts reads & straightforwardly-parses each line, if the keyword was “TERM” check if it matches $TERM, & unless it didn’t reformats keyword-arg pairs quoting each seperated by “=” possibly adding or removing punctuation, replacing keys with acronyms in a pair of lookuptables, or dropping “OPTIONS”, “COLOR”, & “EIGHTBIT” keywords.

This reformatted text is buffered into a string for output.

To relatively efficiently extract prime factors from a number factor, after mostly-typical initialization with added exit-handler outputting any remaining buffered text, parses a handful of commandline flags, possibly zeroes out a frequencies buffer to output after core logic, iterates over trailing commandline args or tokenized stdin.

For each it parses the number, considers taking a fastpath or reporting any errors, or fallsback to using multiprecision arithmatic.

The fastpath (if the number’s small enough i.e. 2 words) recursively dividing by 1,000,000,000 & taking remainder to aid outputting the int to which it adds a “:” (a similar technique is used to output to output factors once computed), & computes the actual factors by first trying extracting some obvious factors & iterates over a pre-generated (I’ll describe how soon) table of prime numbers which are factors of the input in two passes to quickly discard options.

If there’s more prime factors to find it’ll check if the simplified input itself is prime using some math theorems (Miller-Rabin & Lucas) I’m not familiar with, after discarding any additional 2 factors. If so it adds it to a large prime factors array to be outputted seperately.

Otherwise it computes the square root checking if that’s a prime then iterates over 1 of 2 tables & does more computation involving squareroots, ofcourse remainders, & recursion.

Within or after that pass it tries using Pollard Rho’s recursive algorithm involving modulo-multiply/adds/subtracts to narrow done prime candidatse to record.

There’s variants of most of these function for operating in either or two words, and a variant of all of them to operate in a dynamic number of words.

To autogenerate the smallprimes table it parses the first arg as a int, allocs/zeroes some tables, iterates over that range, & outputs results.

For each number “i” in that range it populates a table with p=3+2i, than populates each multiple of p from (pp - 3)/2 to the given max number.

To output the actual primes from those tables it counts the number of bits in a wide_uint & outputs it as a C-preprocessor macro, outputs P macro calls for each prime as the diff from last prime, diff from 8 ahead, & (via bitwise shifts & ORs) the inverse. Then uses the inverse & limits to locate the next prime to output as FIRST_OMITTED_PRIME.


Having just studied GNU CoreUtils, of which most are more or less simple wrappers around various syscalls (for trivial wrappers, see historical code), which LibC will forward to Linux via a special Assembly opcode & Linux will forward to it’s appropriate implementation via caching/mounting lookup layer called the “Virtual File System”.

Linux supports various filesystems but the one you’re probably using is called Ext4FS which I’ll study today!

To aid allocation of blocks in which store files Ext4FS computes group numbers & offsets from a block number ideally via divide, & provides per-group checksummed bitmasks to test whether that memory’s free.

There’s functions for retrieving clusters counts, maybe summing/subtracting/counting them.

Another for dereferencing a block’s descriptor.

There’s a function to carefully retrieve & validate (partially via checksum) a group’s allocation bitmask, with or without blocking on locks. Initializing that bitmask if needed, populated with a scan.

One to check a desired allocation count against various (mostly per-CPU) counters in the superblock’s info to see if desired memory is available, possibly claiming via a wrapper function.

Another for retrieving the # of blocks used by a group, or to retrieve the supergroup possibly via GCD.

There’s a function for counting free blocks by summing each group’s group (with valid bitmask) descriptor’s count, which is validated against bitmasks in debugging build.

It may compute a hint for this allocation by first considering bitmasking & maybe incrementing which blockgroup the rootnode specified it should use, applying a rootnode-specified multiplicand & offset to get the first block number of that group, reads it’s blockcount property & computing from the thread ID.

There’s a multiblock buddy allocator implemented around the single-block allocator & a redblack tree.

There’s code for migrating a file’s blocks between allocation groups.

There’s code to serialize & deserialize access control lists between a file attribute a Linux-shared type.

To handle readdir syscall Ext4FS retrieves encryption data if present, if it’s htree-indexed initializes an iterator if necessary with hashes & that we haven’t reached EOF before reformatting into a redblack tree & possibly unsets a bitflag if there’s a checksum, checks if there’s inline data as a file attribute (useful for configuration & lock files) which it reads specially, maybe allocates some memory to decrypt into, & repeatedlies checks for fatal signals, maps in some more blocks handling error & empty cases, maybe validates blocksizes and/or checksums setting a flag on success, & in a innerloop validates directory entries, increments an offset, & emits each directory entry with or without decryption.

llseeking Ext4 dirs isn’t speical.

HTree directories are converted into redblack trees & on into linear scan dirs if they client wishes to list it. Data from this conversion may need to be freed upon closeing the directory.

There’s a validation function alongside this readdir implementation.

Lots of encoding details are defined structs, enums, & macros. With inlined functions handling byteorder.

There’s a “journal” which tracks all ops in a ringbuffer to aid recovering from unexpected shutdowns.

Extent references have checksums, credits (akin to currency, to determine when to merge), access permissions, dirty flags, meta & space blocks & roots with indexes, & (pre)caching.

They can be split, validated, read, seek via a binary search (2 variants), initialized & deinitialized, “mapped”, & zeroed.

To “map” an extent (ignoring debug output) it first traverses the extents tree reading & validating entries in as needed thus flattening it for a binary search, gets & validates depth, retrieves treetraversal path at that depth if so considers expanding certain holes before returning that block, otherwise unless this is disabled creates it.

Which involves gathering some fields, allocating some space with good memorylocality ideally by extending the allocations on either side, inserts into extents tree merging where profitable, updates reservedspace counts, & considers syncing to the journal.

To truncate some extents it calculates the range to remove, deletes it under lock from tree via a redblack tree retrying slightly later upon failure then any trailing holes.

To fallocate some extents it depending on the given mode it’ll return -EOPNOTSUP, flush state & removes those extents with journalling, tries inlining data into file attributes, flattens that range of the tree, removes the given range from the tree with journalling, zeroes it out, or allocs new ranges.

There’s a couple wrappers around mapping extents converting a selected range to an IOVec to be copied to userspace.

To map some “fiemap” extents it checks cache if indicated to by bitflag (or clears bitflag), validates the given range, & defers to Linux’s generic filesystem code with a callback to read from the file attribute or have it wrap the map blocks code.

To precache it might check inlined data under lock, retrieve cache if bitflagged, run generic logic, validate range, & flattens tree.

Related to the extents tree there’s an extents status tree used for partially locking files and more. This is implemented similarly to extents trees but is entirely in-memory as a redblack tree.

There’s a journalling fastpath for smaller ops.

Underlying the read syscall it constructs an iterator in one of three types if not shuttingdown & non-empty.

Upon refcount deallocation allocs dynamically-allocated blocks, discards preallocations under lock, & frees HTree dir info.

Underlying the write syscall it constructs an iterator in one of three types if not shutting down based on bitflags. The “dax” write iter, after validation, starts journalling, registers & journals an “orphan” inode, & hands a callback to filesystem-generic code possibly wrapped in a decorator or two. I’ll describe these callbacks later. One of the decorators handles most of the journalling, another is filesystem-generic.

The “DIO” iter checks memalignment, obtains locks, journals, hands one of two callbacks (whether we’re overwriting or not) to filesystem-generic code possibly decorated with journalling, cleans up, & commits writes via filesystem-generic code.

Most methods on mmap’d Ext4FS DAX files segfaults unless it’s copy-on-write, though I’m not making much sense of this callback. The methods for normal Ext4FS files are largely filesystem-generic though it ensures blocks are mapped before being written to.

The implementation for mmap decides which of those methodtables to hand to the given virtual-memory struct to integrate into the process’s memory mapping, if it’s not shutting down or unless needed DAX mapping is unsupported.

The implementation for open, with several wrappers, defers to filesystem-generic code. Ext4FS’s llseek is also largely filesystem-specific.

There’s an internal filesystemmap object, which I think lives ondisk.

Syncing is reflected in the journal as “barriers”, and flushes the underlying blockdevice.

There’s an internal hashfunction.

There’s functions for allocating & deallocating inodes directly out of the relevant allocation bitmasks.

There’s a handful of functions dealing with some concept of “chains”, blocks, & paths.

Functions for operating upon file bodies stored “inline” within the file’s attributes.

This section’s quite long, so won’t cover the highlevel inode objects exposed externally.

There’s several supported IOCTLs, updating properties & deferring to the other lowerlevel components.

It uses “MMP” checksumming for the allocation bitmasks, filebodies, filemetadata, etc.

Extent slices can be shifted around.

There’s a concept of orphaned inodes, which sometimes is just a step of allocating inodes.

The I/O methods are an abstraction around internal paged I/O functions.

There’s several functions dedicated to resizing allocation groups.

There’s an internal rate-limited “superblock” structure, defined alsongside some of the methods for mounting & unmounting ExtFS filesystems.

Symlinks are their own type of inode, or rather 3, as exposed to outside world.

There’s a pagecache with (publicly exposed) “verity descriptors”.

And natively understands several file attributes including HURD’s.


In order to interact with the Linux kernel, you need userspace commands (or apps) communicating with it. Since vital I/O components like Bash or Gettext operates in userspace, Linux cannot provide its own UI.

Disk Utilities

Perhaps the most useful commands Util-Linux provides are for handling disks (persistant storage) containing the filesystems on your computer, & splitting these disks into multiple “partitions” via a table the earliest boot stages understands.

After initializing internationalization, parsed commandline flags, colourful output (with config), & several other shared libraries sfdisk configures libfdisk, gathers an array of fields, chooses which subcommand to run & cleans up.

swaplabel, after initializing internationalization & parsing commandline flags validating further commandline args are given, uses a shared library to locate the devicefile, opens that devicefile to either write new metadata at appropriate offsets or reads that metadata via the shared library.

After initializing internationalization & parsing version/help commandline flags resizepart opens the specified devicefile, locates the devicefile for the specified partition, & applies the BLKPG IOCTL.

After initializing internationalization & parsing few commandline flags raw opens a special devicefile to gain privileges, runs the RAW_GETBIND IOCTL on each “minor”, validates commandline arg, possibly stats that devicefile to inform a subsequent RAW_GETBIND IOCTL, extracts major & minor numbers from commandline args, & runs RAW_SETBIND IOCTL.

After same initialization isosize iterates over commandline args. Foreach it opens & validates the devicefile & outputs them textually.

There’s a support file to render menus to aid creating scripts, amongst a few other simpler supportfiles.

After typical initialization fdformat validates the next commandline arg & opens the specified devicefile calling the FDGETPRM IOCTL on it to retrieve various statistics to output, applies formatting IOCTLs (FDFMTBEG, repeated & flushed FDFMTTRK, & FDFMTEND) with status messages, possibly attempts reading the devicefile to see if there’s anything to repair, & cleans up.

After typical initialization delpart opens the specified device file & runs the BLKPG IOCTL on it.

After initializing internationalization blockdev checks for -V/--version or -h/--help args, checks --report in which case it instead iterates over all remaining args or all partitions to retrieve various attributes from Linux & textually output them, validates commandline args, opens each specified device file & interpret commandline flags as IOCTLs upon it.

addpart wraps a different variation of the BLKPG IOCTL.

IOCTLs are basically special methods upon devicefiles.

After initializing internationalization & parsing/normalizing commandline flags partx validates the commandline args possibly stating the specified file, possibly parses an integer out of the path, possibly outputs parsed commandline args, upon add or delete subcommands stats & validates the wholedisk devfile using helper functions to correct this, opens the wholedesk devfile, & chooses a subcommand before cleaning up.

partx’s delete subcommand possibly scans the filesystem for unspecified partitions numbers before deferring to a partx_del_partition function from a shared library wrapped in optional human-readable reporting.

Other commands require a “probe” implemented by another shared library.

After initializing internationalization & parsing few commandline flags validating a single arg remains fsck performs file validation. This file validation includes simple structural checks, error-detection checks via CRC32, & binary-syntactic checks with decompression recreating what files/directories/etc it can. At least that’s for fsck.cramfs.

fsck.minix (after initializing internationalization, configuring exitcode for a utility lib, validating theoretical sizes, & parsing a couple more commandline flags validating commandline args remain) validates the devicefile isn’t mounted, opens it, validates metadata, After checking quick exiting conditions fsck.minix continues by parsing/validating the filesystem possibly outputting status info, configures signalhandlers, alters shell input handling flags, parses the filesystem recreating it with occasional user-prompts for how to recover from errors, possibly outputs summary info, if anything changed flushes & sync’s final pieces (superblock always written in repair mode), & cleans up.

After configuring buffering, initializing internationalization, configuring exit-handling, parsing extensive commandline flags & envvars with preprocessing step, configuring SIGCHLD handler, & loading the mounttable fsck performs some validation & iterates given devices before synchronizing & cleaning up.

For each given device it considers exitting as indicated by signal-handlers, looks up the mount or adds new one, considers skipping it based on type or whether its mounted, & runs appropriate command.

After initializing internationalization & debugging options as well as parsing commandline flags into a new FDisk Context & configuring the shell fdisk branches upon a subcommand before cleaning up.

After initializing columns the List or List-Details subcommands iterates over commandline args or all partitions foreach parsing various info to output them textually. Foreach commandline arg the showsize subcommand opens the devicefile & calls the IOCTL to count its sectors for output.

After validating a single arg remains the main subcommand colourfully outputs some welcome text, “assigns” the devicefile, if successful warn if its being used, flushes stdout, validates its writable & locks the devfile, determines the “wipe mode” if in collision, adds label if missing or warns about GPT labels, reads partition info if it isn’t readonly, & presents an interactive menu for constructing a sfdisk script via a pre-configured callback. Most of the code is dedicated to this menu.

After initializing internationalization, colouring, debug options, & a new FDisk Context as well as parsing few commandline flags cfdisk fills in missing commandline arg, “assigns” the devicefile, locks the devfile if it sin’t readonly, initializes/runs/ends the columnar NCurses menu UI (navigable with arrow-keys) with pre-configured callback, & cleans up. This includes a main WYSIWYG.


Before you can use a disk, you need to write an empty filesystem to it. Linux-Utils provides a handful of commands to do just this for different filesystems.

Lets start by discussing swapspace, where data is written when RAM overflows. After initializing internationalization & parsing commandline flags capturing up to 2 additional commandline args mkswap possibly generates an appropriate UUID, computes an appropriate pagesize from Linux’s figures, performs some validation whilst getting the disk (as opposed to RAM) pagesize, stats/opens/locks/validates the device file, validates the devicefile is fully usable & has no holes, zeroes out the carefully devicefile in reference to partitions, populates some structural metadata, possibly reports the partition size as human-readable text, writes some magic numbers, writes UUID & label, cleans up, & performs some SELinux adjustments.

After initializing internationalization & parsing commandline flags parsing subsequent couple args mkfs.minix performs some validation opening & locking the devicefile, gathers & computes various structural metadata including optimal pagesize & a bitmask reporting counts textually to stdout, optionally validates it can seek to & read each zone updating the bitmask or reads a file of bad blocks, serializes this structure to the disk (root, then bad inodes), marks good inodes & serializes out filesystem tables, & cleans up.

After initializing internationalization & parsing commandline flags followed by 2 additional args mkfs.cramfs retrieves ideal pagesize, stats & opens the devicefile, initializes permissions, populates & (via MD5 hash) deduplicates from a template dir, allocates & zeroes a large chunk of memory, optionally loads a file into it, write file metadata in breadth-first order, postprocesses the CramFS entries compressing any filebodies, computes a CRC32 to protect against data degradation, checks we have allocated enough space, writes this buffer to specified disk closing the file, & reports any flagged warnings.

I believe CramFS is what distros use when they’re running off a USB or DVD, before they’ve installed onto rewritable disk.

After initializing internationalization & pasing commandline flags extracting 1 or 2 additional args opening & stating the first as a devicefile mkfs.bfs retrieves optimal pagesize for the disk or user-specified value, computes some sizing info gathering that into structural metadata, optionally reports various data, writes that structural metadata to specified disk, gathers & writes more structural metadata, zeroes out the body, & seeks to root inode to write “.” & “..” entries.

And finally all these variants are wrapped by a mkfs command!

After initializing internationalization & parsing few commandline flags mkfs defers immediately to its specified variant defaulting to (which doesn’t appear to be in this project) mkfs.ext2. I hardly see the point of this command… Ah, its deprecated!


Underlying the suite of disk commands described above are a suite of shared libraries, including libblkid for accessing a disk’s metadata!

Some of these APIs simply wraps IOCTLs like BLKGETSIZE[64], most are more complex. The library can report & parse its own version number.

There’s a routine for parsing a given key=value configuration file defaulting to $BLKID_CONF. There’s a blkid_dev object containing a linkedlist of tags with a human readable serialization.

There’s an iterator which over a cache for retrieving these blkid_devs with configurable filters.

Much of the logic is involved in locating, reading, & throttled-writing an aggregated XML-like (doesn’t use an XML-parser library though) cachefile of all disk metadata, storing entries in a garbage-collected linkedlist.

Wrapper functions retrieve tag values & device names from cache, & to verify device in the cache.

There’s a parser for which metadata to retrieve.

They’ve implemented their own UTF-8 encoder library for some reason, & wraps it with escaping logic. I’ll discuss a use for this tonight.

There’s a suite of utilities for querying a disk’s tags, including an iterator.

There’s utilities for traversing the /dev filesystem for disk devices & their partitions possibly populating a linkedlist.

There’s a routine for iterating over the cache multiple times (strict vs fuzzy match) looking for disk by name adding & verifying a new entry if needed. A wrapper looks up by disk number with filepath normalization, checking multiple paths. This in turn has its own wrappers consulting the /proc/partitions/, /proc/lvm/VGs/, and/or /sys/block virtual filesystems.

There’s parsers for strings Linux might send programs to notify them of new hardware.

There’s a probe object which retrieves metadata directly from a disk devicefile via its fstat info with few additional IOCTLs, & aids reading it via a values-linkedlist & 3 method tables it can step between (treating it as 2 semi-fallbacks), temporarily switch to a different methodtable, or unset. And it has routines for linkedlist-buffered reading from a configurable-slice of the devicefile.

These methodtables, each implemented in seperate subdirectories, provide facades upon further methodtables parsing a wide variety of disk formats including partition tables, filesystem superblocks, & to a lesser extent “topology” tables. Each of those sublibraries each further provides common utilities to aid the parsers in yielding common datastructures.


Another supporting library behind Util Linux’s disk commands is libfdisk!

There’s a “labelitem” object with name, id, 64bit, & type-tagged data fields.

There’s an iterator object holding a direction.

There’s code (generated largely by the C PreProcessor) for parsing a bitmask of what info to return.

There’s a “label” tree-object with name, type, & flags as well as a methodtable. The context object stores a forest of these, & an active one.

There’s a routine which calls the probe & possibly deinit methods on labels to decide which one to make current. I see several datamanagement methods on these label objects, as well as some structural metadata I field to mention.

There’s a partition tree-object with a type, name, uuid, fstype, fsuuid, fslabel, & slices. There’s a routine for extracting the next step of the linear partition order, possibly asking the user, or to serialize to a human-readable string.

There’s partition tree traversal/manipulation methods, and ones to compute possible slices of the disk with human readable debug output whilst enforcing invariants. May defer to the label’s methods.

There’s a field object containing an id, name, width, & flags.

There’s a partition-type object associated with labels each holding a name, typestring, & code. These can be queried from a label, possibly based in textual user input.

An FDisk script holds an FDisk table, linkedlist key-value headers, an FDisk context, refcount, getter method, linecount & Fdisk label & a couple flags.

Amongst normal object routines these “scripts” has routines for parsing from a context or line-by-line with a tokenizer from a file, or as JSON. Or to save such files.

There’s routines for requesting user input via callbacks, and to align bytes as required by hardware.

An FDisk context refcounted object wraps a filedescriptor, labels, callbacks, structural metadata, script, etc. The routines for associating & deassociating with an open devicefile are nontrivial, populating the partitiontable via an IOCTL.

An FDisk table holds a partitions list with mutation (especially regarding freespace), sorting, serialization, validation, & diffing routines.

There’s some I/O utilities. & partitions may have a list of areas to be wiped, possibly applied via libblkid.

Partitioning table formats

Util Linux’s libfdisk’s Context object contains an array of labels, each corresponding to a different partitioning format via a methodtable & propertytables. This is where most of libfdisk’s code is! The commandline menus may directly call accessors specific to each format.

I’ll specifically describe how its GPT support works, the others are implemented in basically the same way.

To “probe” the GPT label it casts the read data to the appropriate C struct, iterates over some arrays to validate the structure, normalizes the format to what the CPU can process. Whilst dereferencing pointers from disk with CRC32 error detection. There’s a couple copies of these headers saved far apart in case one gets corrupted. This load can fail, requiring cleanup.

To save the GPT label after casting & validating ensuring there’s no partition overlaps it positions backup at end, computes CRC32, & writes segments to appropriate offsets.

Verification involves various structural & CRC32 checks, issuing warnings (via callbacks) to the commandline upon failure. Otherwise might output some basic info with a success message.

Creating a new GPT system involves allocating & initializing various properties, possibly sourcing structural metadata from the context or script with CRCs. Some in disk byteorder, some in CPU byteorder.

The locate-disklabel method branches over the given number to return a property name, offset, & size. The offset & possibly size are sourced from properties in-memory.

get-disklabel-item involves branching over the given item’s ID to populate its name, type, & data fields. The datafield again comes from properties in-memory in disk byteorder.

Setting a disklabel-ID involves parsing a GUID asking the user for it if not given, saving it with new CRCs, & reporting change to the commandline.

After a little validation getting a GPT partition involves an array lookup, & converting the resulting into a format independant structure serializing miscallaneous properties to text. Setting a partition performs the same array lookup to determine where to parse/copy the given data to. Simpler variants are offered.

Adding a partition works similarly to setting a partition, but with additional (slow) allocation logic & may request userinput for (or generate) missing data. Reports results.

Deleting an entry involves zeroing out the array entry, once we’ve validated its unused. Thus requiring new CRCs to be computed, & dirtiness is flagged.

Reordering partitions involves a check to see if its actually before running a standard qsort (which I believe is actually Merge Sort in GNU LibC). Again CRCs need to be recomputed & dirtiness flagged.

There’s a special unused-entry GUID used to check whether a partition’s empty & valid partitions need a start.

There’s method for toggling specified bitflags (determined via branching over given int) reporting to the commandline via a callback. Yet again CRCs need to be recomputed & dirtiness flagged.

There’s a simple finalizer. And a method to extract alginment info from the GPT structural metadata to the context object.


Another library underlying Util Linux’s disk-management commands is LibMount!

It has a handful of parsers for different bitmasks, namely debug flags & mount flags. There’s a simple parser for option strings, with mutation utils for the resulting array. And one for the mount TSV. There’s routines for reporting the library’s version & features.

There’s language bindings for Python.

There’s routine performing various checks to see if the given context is a loopback device.

There’s a routine for iterating over the mount table locating the loopback mount corresponding to given validated parameters. And a wrapper around it which extracts loopback mount options, mounts the loopback if missing via a loop context whilst validating the backing file.

There’s a couple routines for deleting such a loop device. These all build upon a more general shared library here offering an object mostly just wrapping /dev/loop* devicefiles.

There’s a context object (which those are treated as methods upon) consisting of various properties including mount tables, status codes, bitflags, subcommand, options, callbacks, filesystem objects, directory/file paths, etc. Has accessor routines.

There’s a linkedlist of mounts including routines for creating a new mount reusing options on an existing one. And a routine that preprocesses mountoptions possibly through that & similar routines, or SELinux. Further wrappers actually mount it.

libmount context table accessors parses in the table, configured as per other properties, if it hasn’t been already. A cache object further avoids reparsing the mounttable. The namespace accessor has various validating wrappers.

It has a routine for running the appropriate/configured helper routine, running the mount syscall, or a stack of wrappers that do both/either as appropriate, each layer handling additional options.

Can yield human-readable errors.

There’s routines for loading the mounttable & locating specific entries whilst switching namespaces. Has wrappers & similar routines. Some querying mount options.

The main & “u” mount tables have similar but separate routines around them.

There’s routines for parsing mount options into a context.

There’s a routine for reading BTRFS subvolumes via a special IOCTL.

There’s a key-value refcounted cache falling back to a libblkid cache for tags, as described previously.

There’s a filesystem object holding a linkedlist, source, bindsource, tagname & value, root path, swap behaviour, target, filesystem type, various option strings, attrs, a comment, bitflags, userdata, etc. These optionstrings can be merged.

There’s an iterator object over mountpoints.

There’s an object for generating lockfiles, ensuring signals are handled correctly.

There’s a “monitor” wrapping inotify syscalls with logic to handle lockfiles.

There’s a “table” object wrapping a linkedlist with a common “intro” & “tail” as well as comments (including intro & tail ones) & userdata. Can have an associated cache. Can be iterated over to locate particular filesystems, & can ofcourse mutate the list. Includes special BTRFS support.

There’s a tabdiff object holding a linkedlist upon new and/or old filesystems, & a list of unused filesystems. Can compute a diff between two tables.

There’s an update object holding a target path, filesystem, filename, mountflags, userspace-only flag, ready flag, & mountinfo table. Accessors often include heavy validation. Can serialize out to a standard file.

Finally there’s utilities it builds upon for:


To improve the output of its commands Linux Utils implements a “libsmartcols”, for better aligning columns exposing more fine-grained controls.

Includes for debug bitmask & its own version number. Has internal routines for outputting human-readable debug output for column or columns.

There’s a wrapper around linewrapping, mbs_width, or mbs_safe_width with postprocessing. There’s a further wrapper adding table/tree traversal computing min & natural widths to compute column widths.

Another wrapper-layer iterates over the table’s columns using that routine to compute the size of each one, ensuring the total width is reasonable for the terminal size via extensive postprocessing with a choice of loops.

There’s a cell object holding data, colour, userdata, bitflags, & alignment with comparator & clone routines.

There’s a column object holding colour, headercell, min/max/avg/hint/final width, bitflags, JSON type, table reference, & comparison/wrap callbacks.

Column objects have hard line-wrapping routines.

There’s a “grouping” object wrapping a linkedlist multistring to yield a zip iterator over the lines of each cell in a row, thus lowering tabular linesplitting into something closer to what can be written out.

There’s a column iterator object.

There’s a line object wrapping an array of cells with arbitrary userdata, colour, & an optional tree structure, supporting column rearrangement & cloning.

There’s a symbol object holding a tree branch, vertical & right tree-links, vertical & horizontal groups, first/last/middle member, middle & last child, & title/cell padding.

There’s a table object holding an output filedescriptor, desired dimensions sourced from the terminal, desired output format, lines, columns, groups, symbols, a title cell, & flags. Has lots of accessor routines, & wrappers around its properties’ methods. Has sorting routines.

There’s treetraversal utilities.

Building upon all that the bulk of the logic is to output these objects to a tree utilizing the groupings & the sizing routines. The concept of symbols are used to serialize the tree structure into ASCII art.

With some lower-level utils…

To print an emptycell libsmartcols outputs any colour markers, considers outputting tree symbols, considers finalizing the line, otherwise filling with spaces before ending the colourspan & probably emitting a column separator.

There’s a routine for printing a treeline, handling JSON specially. And another to traverse the tree with that routine.

There’s routines to initialize & cleanup such printing.

Further wrappers ensures the table title & headers are included in those ranges, enforce validation, & allow printing to a string in-memory.


Partitioning formats may incorporate UUIDs, so Linux Utils implements its own trivial libuuid module!

This has routines for zeroing-out 16bytes, parsing 2 UUIDs & comparing their components, copying one UUID to another, checking whether a UUID is zeroed, (de)serializing UUIDs, (un)parsing UUIDs, resolving known (“dns”, “url”, “oid”, “x500”, or “x.500”) namespaces. Or extracting the unpacked components (time, type, or varient) out of a UUID.

Most of the logic is in generating UUIDs.

There’s a few varients of UUID generation. You can take a MD5 or SHA1 hash & tweak its time components.

Or you can, possibly falling back to time generation, you can generate random data (via the getrandom syscall or reading /dev/[u]random scrambled with the time) a given number of times tweaking the time.

Or you can (“time generation”) consult a daemon via a socket, tweak the time, or retrieve the time alongside ensuring an “init” message is present, randomly generating it if needed.

Login Utilities

Util Linux has a suite of accounts utilities.

After initializing internationalization & parsing commandline flags nologin opens & fstats /etc/nologin.txt, & copies its contents to stdout if valid.

After initializing internationalization & parsing/validating commandline flags su & runuser checks whether we’re running as root, saves some globals in a mini-library, initializes the terminal as desired, reads a password from stdin without echo, retrieves old & new accounts, calls into PAM (a fairly verbose API), determines which subcommand to run, configures groups, configures limits, opens a PAM session, optionally creates a new pseudo-teletype devicefile, forks a parent process to cleanup from the subshell, optionally configures the new pseudo-teletype devicefile, makes the setgid/setuid syscalls, modifies the envvars & commits them to PAM, cleans up PAM, & runs the subshell!

After initializing internationalization & parsing -V/-h newgrp retrieves your username & calls setgid with an implicit or explicitly given value before restoring UID via setuid syscall & running a subshell.

After initializing internationalization & parsing -V/-h vipw & vipr (operates on different files), edits the main file, then edits the corresponding “shadow file” if the user accepts it & it exists.

To edit those files vipw/vipr clears some limits & signals, locks the file, opens it, creates a tempfile validating that doesn’t exist, runs the configured editor, possibly reopens the editted tempfile, validates whether any changes have actually been made, configures UNIX permissions, backs up old file, & cleans up.

After initializing internationalization & parsing few commandline flags, utmpdump configures specified output & input, before choosing the dump or undump subcmd.

Undumping involves tokenizing a line & writing out the gathered binary data. Dumping involves reading in that binary data (possibly with some tailing logic) & serializing it back to text.

After initializing internationalization & parsing a few commandflags + single commandline arg chsh looks up the given user, possibly validates they’re in the /etc/passwd file, possibly performs some SELinux checks, retrieves the previously-set shell, validates we’re altering ourselves, calls into stdlibc to check whether the old shell is listed in /etc/shells, possibly checks with PAM, prompts if needed for a new shell (via libreadline if available) then validates, checks whether we actually need to change the shell, & actually applies the change via a shallow wrapper the external libuser library or semi-manually altering the passwords file via an intermediate file.

chfn works largely the same way except it formats & sets finger information rather than a valid shell. Which consists of multiple subfields to prompt for.

After initializing internationalization & parsing commandline flags last outputs latest logins by iterating over each of the given files defaulting to /var/log/btmp or wtmp. For each (handling fuzzing slightly specially) it retrieves the boot-time from the kernel, it opens the file with a configured buffersize, carefully reads the binary data fstating on failure, seeks to end, & iterates over the records in that file. That is repeatedlies reads & validates an entry, possibly & extensively reformats (including a DNS lookup) into human readable output, applies some normalization, tweaks locals and/or output additional info in a similar format, & clears state upon seeing shutdown loglines. After which it attempts to output a timestamp from the file, & it cleans up. Including freeing the filepath that was opened.

Upon processing user-process records it may open /proc/?/loginuid to determine if its a “phantom”.

After initializing internationalization alongside some collections & parsing commandline flags sulogin validates its running as superuser, ignores various signals, possibly ensures necessary device-filesystems are mounted (upon exit it’ll unmount any it mounted), retrieves the commandline arg or $CONSOLE envvar, retrieves possible console devices via /proc, /sys, /proc/cmdline, or an IOCTL with a fallback validating results are returned, reconnects the pipeline files, retrieves root account, & iterates over consoles. For each console sulogin attempts to open one which isn’t overloaded & initializing it to support Plymouth (a.k.a. splashscreens), various IOCTLs, locales, & baudrate.

If successful soft-ignoring SIGCHLD signals sulogin iterates over those opened console connections. After validation it forks a child which’ll repeatedly ask for a password if needed & run a subshell.

Trailing oops waits for subprocesses to end, closes opened consoles, & wait for subprocesses again. Between restoring signal handlers.

After initializing internationalization, configuring signal handlers & scheduling, & parsing commandline flags + singular arg login configures a process group & the terminal, opens the system logger, configures PAM possibly requesting authentication from it, loads the password entry, loads groups, opens a pam session, clears a watchdog timeout, closes the passwords database, logs some info, switches terminal info, configures environment variables, generates a new process title, logs some more, optionally outputs login messages, forks a subprocess with watchdog parent, drops privileges, cds to home, and runs the appropriate shell.

This command generates the btmp logs consulted by last. Can consult a “hushlogin” file via a mini-sharedlib.

And finally… After initializing internationalizing & parsing/validating commandline flags with the aid of libsmartcols checking if at most one arg remains lslogins parses wtmp or btmp files to reformat for libsmartcols. Includes optional systemd logd support, which I suspect mostly just serves to reinforce systemd’s poor reputation.

Miscallaneous Utils

After initializing internationalization & parsing/validating few commandline flags mcookie initializes MD5 hashing, iterates over the -f flags opening those files & MD5 hashing them, frees that array, seeds some additional randomness into the MD5 hash, finalizes the MD5 hash & outputs it. Utilizes some smaller sharedlibs.

Ater initializing internationalization & parsing/validating commandline flags uuidgen if a -N name is given might (-x flag) decode that data from hex form. Then it chooses a subcommand to run, serializes the UUID via libuuid (as described earlier) to output it!

uuidgen -t calls libuuid’s uuid_generate_time, likewise uuid -r calls libuuid’s uuid_generate_random. uuidgen -m & uuidgen -s parses a template with or without a namespace before calling uuid_generate_md5 or uuid_generate_sha1 respectively. Otherwise calls uuid_generate.

After initializing internationalization & parsing commandline flags validating 3 args remain rename validates its actually making a change, possibly queries terminal text entry state, iterates over all trailing commandline args, & converts aggregates error-codes into an exit code. For each of those args it evaluates a template string & carefully calls rename (or for -s flag symlink) with optional verbose output and/or confirmations.

There’s a testsuite with a bespoke testrunner.

After initializing internationalization & parsing commandline flags validating more args remain namei initializes 2 ID caches (defined as linkedlists in a mini shared lib), if successful iterates over all commandline args then cleans up. For each arg it stats the filepath, adds each path component to the caches, & with or without following symlinks outputs each unique path.

After initializing i18n & parsing commandline flags uuidparse converts given args from libuuid to libsmartcols.

After initializing internationalization & parsing/validating commandline flags the uidd daemon (mentioned whilst discussing libuuid, comes with systemd & RC init scripts) might defer to the existing running daemon in one of 4 ways. Or (with optional systemd buildtime tweaks) carefully creates a UNIX domain socket, timeout, & PID file, daemonizes & drops prileges, blocks signals preferring to poll for them, & repeatedly polls for events until error.

For each of these events uuidd considers whether to clean up & exit as requested by signals, accepts & reads the connection, possibly reads a numeric argument, branches over the opcode, outputs the response, & closes the socket. For GETPID it serializes the result of the getpid syscall. For GET_MAXOP it serializes a compiletime constant. For the other opcodes it wraps internal libuuid APIs. For unsupported opcodes it’ll close the socket & might complain.

After initializing internationalization & parsing commandline flags lsfd converts those options into libsmartcols configuration, optionally parses the given filter into a lsfd_filter object (handwritten lexer&parser) possibly attaches an error message causing the command to exit, possibly outputs some debug info, allocs counter objects with their own filters, qsorts by PID, initializes more data, & renders.

To initialize more data lsfd it inits linkedlists into each entry of the nodev array, calls initializer methods on dispatchtables for files, cdevs, bdevs, sockets, & unknown method tables, parses /proc/devices, iterates over /proc/ to parse all running processes, & converts that data into a libsmartcols table utilizes those methodtables.

As for rendering it involves outputting the libsmartcols table and/or constructing a summary table holding the counters to output, before cleaning up.

Much of the effort is involved in parsing the /proc/ filesystem.

Counter objects hold name, count, & a filter object mainly serving to tally the number of matches against the filter.

Filter objects contains an AST parsed by the aforementioned handwritten lexer/parser, which is evaluated via dynamic dispatch. Possibly by deferring to a regexp interpretor. They hold a libsmartcols table, array of dynamically-typed libsmartcols columns with dynamically typed “parameters”, & error message.

Finally I mentioned there’s methodtables for gathering table data.

The bdev methodtable parses /proc/partitions into a global linkedlist, the properties of which can populate a libsmartcols tablecolumn.

The cdev methodtable does likewise for /proc/misc.

The fifo methodtable, without any preparation, labels the appropriate parsed files as pipes.

The file methodtable wraps an ID cache to serialize the file’s extensive data. With additional methods to parse in more data.

The socket methodtable reads in all `system.sockprotoname attributes from the /proc//fd/ & /proc//map_files/ to populate tablerows.

Finally the unknown methodtable labels these rows as such…

After initializing internationalization, configuring signal & exit handlers, & parsing several commandline flags validating more arguments remain hardlink retrieves the current monotonic time, initializes a file comparator using specified method falling back to memcmp, utilizes a couple stdlib traversers.

The 1st per commandline arg (after some checks including for interrupts & regexp matches) gathers sources.

The sources are gathered into a binary tree & linkedlist.

The second iteration over that binary tree & the items in the contained linkedlists (after some checks including whether signals have occured) iterates over subsequent nodes with its own validation including file comparison to finalize the linkedlist to call the link syscall or FICLONE IOCTL with error recovery & human-readable reports. Another linkedlist iteration cleans up the comparators.

On exit it outputs stats.

After initializing internationalization & manually parsing commandline flags valdating more args remain kill iterates over the remaining arguments parsing them as integers to send appropriate syscalls according to the given flags possibly outputting verbose messaging. Possibly iterating over /proc/ directory to find the appropriate PID. A 2nd loop cleans up, before returning appropriate errorcode.

May instead output hardcoded data like the signalname table.

After initializing internationalization algonside some linkedlists & parsing commandline flags (with a bit of validation) constructing an logging message logger converts error-mode flags into that logging message structure, opens the UNIX or INET logging socket, & carefully writes the logging message to the structure populated with each arg or stdin line in turn, then cleans up reopening the socket when necessary.

Has some minor systemd logd integration optionally builtin per buildflags.

After initializing internationalization & parsing commandline flags validating a couple args remain look validates the -t flag, opens & mmaps the specified file (from args, flags, envvar, or /usr/share/dict/words), reformats the query arg, performs a sorted binary then linear prefix-search, outputs any matches, & cleans up.

After initializing internationalization & parsing commandline flags lsblk validates it can access /sys/dev/block, & finalizes column selection & other options, initializes LibMount & LibSmartTable, parses path debugging options, initializes & configures LibSmartTable, iterates over all the gathered columns to carefully convert them into LibSmartTable’s format, allocates a “devicetree”, either iterates over /sys/dev/block or the commandline args to populate the devicetree including dependencies, optionally applies some deduplication, converts the devicetree to LibSmartTable rows, optionally sorts the table, outputs the table, & cleans up!

The device tree (implemented in a seperate file) is formed from refcounted “device” objects holding a various names & a deduplication key, tracking its own dependencies. The device tree (or rather forest) is a linkedlist of these devices with their own iterators. Can be parsed out of /pktcdvd/device_map.

Yet another file implements a more detailed “device” object populated by Linux device notifications (a.k.a. “libudev”).

Another file implements the integration between lsblk & LibMount.

After initializing internationalization & parsing commandline flags lslocks initializes a linkedlist, finalizes columns, initializes LibSmartCols, parses/iterates over /proc/locks to populate that linkedlist, if successful converts the configured columns & parsed locks into LibSmartCols data (where most of the effort is) so LibSmartCols can render it, & cleans up.

After initializing internationalization & validating there’s commandline args whereis parses debug options, gathers a list of various filepaths where the exist on the local system from hardcoded data or envvars, iterates over commandline args (–help & –version were handled specially earlier) performing the core logic or manually parsing commandline flags, then cleans up.

This core logic ran for each non-flag argument involves with a few tweaks iterating over the gathered directory paths, opening each of those dirs to iterate over them until the desired command is found.

After initializing internationalization & validating there’s 2 commandline args (handling -V/–version & -h/–help specially) findfs just calls LibBlkID’s blkid_evaluate_tag.

After initializing internationalization & parsing commandline flags validating additional args remain fincore finalizes specified columns, initializes & configures LibSmartCols, converts configured columns over via array lookups, iterates over the commandline args to populate the table, outputs said table, & cleans up!

For each commandline arg it temporarily opens, fstats the file, & mmaps the file to count how many pages are in memory via mincore syscall. The results of which are converted into a new tablerow.

After initializing internationalization & parsing commandline flags with a tad of postprocessing getopt reruns the getopt routines to yield output shellscripts can better handle.

After initializing internationalization & parsing commandline flags validating additional commandline args remain & warning about meaningless --backup options wipefs either:

After initializing internationalization & parsing/minorly-validating commandline flags blkid may:

  1. Iterate over cmdline args stating each & gathering into a devices array.
  2. Various tweaks to the parsed flags.
  3. Retrieve the LibBlkID cache exitting upon failure.
  4. Garbage collect the LibBlkID cache.
  5. Gathering data from the LibBlkID and/or the kernel directly.
  6. Output a (non-interactive) tableheader
  7. Iterate via a LibBlkID prober gathering its various data to output as a (noninteractive) tablerow.
  8. Evaluate the configured search type & value via LibBlkID outputting result.
  9. Loads the given devices into LibBlkID’s cache & outputs each tag for the device search.
  10. Iterate over a LibBlkID prober over all partitions/mounts printing their tags.
  11. Iterate over the given devices validating against the search outputting all their tags.

Options 7 through 11 are exclusive. After all this it, as usual, tidies up.

After initializing internationalizing & parsing/validating commandline flags & envvars cal parses or retrieves the time, checks trailing argcount, enforces some calender invariants, retrieves localized weekdays & order, parses colour configuration, tweaks various flags thus computing initial layout, & renders a yearly or monthly calendar.

If there’s a 3rd trailing commandline arg its parsed/validated as the day, if there’s a 2nd its parsed/validated as the month, if there’s 1 its parsed as the year reapplying leapyears. If there’s it applies some tweaks. More errors out.

A yearly calender has an additional centred heading.

A monthly calendar (after some tweaks for multimonth calendars) it computes the number of rows to iterate over. For each row it applies a tweak for the final one, iterates over the months in the span, & outputs results as determined by -v & layout.

For each month the row iterator computes which weekday the month starts on (handling September 1752 specially) & maybe which day of the year it is (if you’re doing Julian) populates an array representing the row, & possibly adds the weeknumber.

Finally… after initializing internationalization & parsing commandline flags findmnt finalizes which columns to render, finalizes a matchtable & bitflags, initializes LibMnt, parses the given tabfiles via LibMnt, checks whether there’s a kernel FS in the mount table, finalizes some bitflags based on a couple conditions, optionally loads the LibMnt cache, optionally enforces for unique filesystems, & optionally performs extensive validation with human-legible error messages.

If not performing a verify operation, initializes & configures LibSmartCols, populates its columns, might converts LibMnt into LibSmartCols rows outputting intermediaries with or without filtering and/or event handling including intermediate output via the repeated poll syscall (bulk of the logic!), outputs the table, & cleans up!

Tablerow data may come from newdevice events via LibUDev (not to be confused with udevd).

Scheduling Utils

These commands control the multiplexing of running programs (termed “processes”) onto a CPU accross time & the CPU’s cores.

After initializing i18n & parsing few commandline flags validating at most 2 args remain (1 given -p) taskset retrieves the core count via the sched_getaffinity syscall & allocates a bitmask for it, parses the cpulist in one of two ways, possibly iterating subprocess applies the core logic, cleans up, & possibly runs a subcommand. This core logic involves calling the sched_getaffinity and/or sched_setaffinity syscalls, with an optional extra trailing getter syscall. The CPU bitmasks it retrieves are reported to the user in one of two formats matching the input ones, & errors are reported.

After initializing internationalization & parsing/validating commandline flags uclampset either:

This is optionally followed by (possibly iterating over all subprocesses) by calling sched_getattr for info to print out. And/or it may run another given command!

After initializing internationalization & parsing/validating commandline flags ionice may either:


After initializing internationalization & parsing/validating commandline flags chrt (possibly iterating over subprocesses) calls the sched_setscheduler xor (with gaps filled in by getpriority) sched_setattr syscall, possibly outputs info from sched_getattr falling back to sched_getscheduler/sched_getparam for the process & possibly all subprocesses, & possibly executes a subcommand.

System Utils

I’m not clear why commands get categorized here as opposed to “misc”, but there are a lot of commands here!

After initializing internationalization & parsing few commandline flags validating more commandline args remain setsid may check the pgroup & if matches PID fork a subprocess reporting any errors, runs the TIOCSCTTY IOCTL, & executes a subcommand.

After initing i18n & parsing commandline flags validating args remain tunelp opens & fstats the specified file, tests the LPGETIRQ IOCTL, iterates over the IOCTLs enqueued from commandline flags outputting human-legible results from applying them to the file treating LPGETSTATUS special, optionally runs GETIRQ last with special outputting, & cleans up.

After parsing minimal commandline flags validating there’s at least 3 present args, performs various validation via the stat syscall mounting each specified umount, cds to the mountpoint, opens the root dir, mounts the new root, applies chroot syscall & cds back to root, forks a subprocess which recursively deletes files, validates permissions, & executes a subcommand.

After initializing internationalization & parsing/validating few commandline flags swapoff initializes LibMount before iterating over given labels, UUIDs, commandline args, & possibly all mounted filesystems twice to carefully/verbosely apply the swapoff syscall. Followed by cleanup.

After initializing internationalization + LibMount & parsing commandline flags the counterpoint swapon command may:

Then cleans up.

Most of the code here is in validating whether its safe to apply the swapon syscall for a specific devicefile with error reporting. Shares a little bit of code with swapoff mostly for parsing commandline flags or abstracting LibMount.

After initializing internationalization & manually parsing commandline flags renice manually parses given priority & iterates over subsequent args manually parsing those flags. For each it possibly resolves the specified user & necessarily runs the setpriority syscall (surrounded by getpriority syscalls to determine output).

After initializing internationalization & parsing minimal commandline flags pivot_root wraps the corresponding syscall.

mountpoint consults LibMount to determine whether the located/stated devicefile is a mountpoint.

After initializing internationalization, checking program name, & parsing/minorly-validating commandline flags setarch possibly looks up given architecture in a newly-initialized table, merges options with that optional lookup, wraps the personality syscall possibly called twice for errors, validates results via uname syscall, possibly outputs verbose info, & executes a subcommand.

After initializing internationalization + LibMount & parsing commandline flags some of which results in dropped privileges umount may before cleanup either:

After initializing internationalization & parsing commandline flags readprofile may attempt to write an optionally-given “multiplier” to /proc/profile as root. Otherwise the optionally-specified profile file is fully read into memory with early exit upon failure, considers reversing the bytes. opens the specified “map” file (falling back to /boot/ or release-specific file as per uname syscall), locates stext section, parses lines until etext header reformatting for humans, outputs tallies, & cleans up.

After initializing internationalization & parsing commandline flags nsenter may:

  1. Apply given SELinux configuration
  2. Consult namespacefile from a hardcoded table which exists on the system to build a bitmask.
  3. Necessarily opens each namespace file in the bitmask (or different file may be opened during flag parsing) rewriting that table.
  4. Open the root or CWD files.
  5. Necessarily rebuild that bitmask.
  6. Call the setgroups syscall.
  7. Necessarily calls setns sycall on each opened namespace file closing successful files & retrying erroring files once
  8. Open the current working directory
  9. Change the current working directory
  10. Open the given working-directory namespace file
  11. Changedir to it
  12. Switch to a subprocess with parent waiting on it to close
  13. Call setgroups, setgid, and/or setuid syscalls
  14. Run a given subcommand

If it gets this far it executes the shell.

After initilaizing internationalization & parsing commandline flags largely into a “limits” linkedlist validating additional args remain prlimit fills in fallback columns, initializes LibSmartCols, fills in fallback limits, iterates over those limits calling the prlimit[64] syscall (possibly verbosely) for each removing successful modify ops, if there’s anything remaining to show configures a new LibSmartCols table (reformatting/removing limits into tablerows) & displays, & possibly runs a subcommand.

After initializing internationalization & parsing/validating (against -a specified file or /etc/adjtime) commandline flags rtcwake opens the specified devicefile, configures $TZ envvar, checsk RTC_RD_TIME IOCTL or [mk]time syscall possibly with verbose output, possibly calls RTC_WKALM_SET IOCTL (possibly surrounded by calls to ctime_r syscall) before sleeping, applies specified task, maybe calls RTC_WKALM_RD/SET IOCTL to disable it, & cleans up.

That task may involve either:

After initializing internationalization & parsing few commandline flags & other args rfkill may either:

After init’ing i18n + LibMount & parsing/validating extensive flags mount may:

Before telling LibMount to apply the mount operation retrying once, reporting result, & cleaning up. Debugging hints for systemd initd or SELinux may be outputted depending on buildflags. Most of the effort is in parsing flags, though LibMount holds most of that state.

After initializing internationalization & parsing commandline flags filling in fallback columns & devicefile (/dev/watchdog[0]) wdctl iterates over each commandline arg.

For each arg (or iterating once for fallback device) wdctl increments a counter, carefully writes “V” to the devicefile possibly followed by WDIOC_SETTIMEOUT and/or WDIOC_SETPRETIMEOUT IOCTLs outputting results, applies various IOCTLs (WDIOC_GETSUPPORT falling back to WDIOC_GET[BOOT]STATUS, WDIOC_GET[PRE]TIMEOUT, & WDIOC_GETTIMELEFT) & writes “V” to the devicefile falling back to corresponding sysfs file, & outputs various choices of that data possibly via LibSmartCols.

After initializing internationalization & parsing/validating commandline flags + debug envvars zram may either:

After initializing internationalization & parsing/validating extensive commandline flags setpriv might configure SELinux and/or AppArmor devicefiles, run SET_NO_NEW_PRIVS PRCTL, reset envvars, configure SETPCAP/SETUID/SETGID capabilities, get/set res UIDs, run setgroups/initgroups syscalls possibly to clear the groups, apply SET_SECUREBITS PRCTL, parse & apply bounding-set/inherit/ambient capabilities, or apply SET_PDEATHSIG PRCTL; before executing a subcommand. Most of the effort is in parsing or debugging commandline flags, reuses LibCapNG. Reads like sandboxing syscalls!

The lscpu command displays information on all the CPUs present in your computer.

After initializing internationalization & parsing/validating/normalizing commandline flags into a newly-allocated “context” ensuring no additional args remain lscpu parses custom debug flags, allocates some filepaths, reads sysfs for CPU mask & count allocating a CPUs array, parses /proc/cpuinfo line-by-line utilizing cache lookups, & considers calling the personality syscall, reads several more sysfs devicefiles with sorting, & depending on the given “mode” uses that data gathered from sysfs to (before cleaning up) either:

By far most of the effort is involved in reading this data in from various sources, the sourcecode for which spills accross several files.

Util Linux includes several commands for listing present hardware as rich tables.

After lightly initializing internationalization & parsing/validating a handful of commandline flags lsirq parses given columns with a fallback, parses irq data line-by-line from /proc/softirqs or /proc/interrupts filling in missing descriptions sorting results & reformats into a LibSmartCols table via code shared with irqtop, & displays that table.

After lightly initializing internationalization & parsing some commandline flags & with fallback columns irqtop configures the terminal via some LibCurses variation, & configures/enters an epoll mainloop. In this mainloop (and for most part immediately before) it constructs a irqs table just like lsirq, optionally extracts CPU info from that data & reformats that into its own LibSmartCols table, adds a heading & serializes to text, & refreshes LibCurses after event handling involving:

After initializing internationalization & parsing/validating commandline flags ensuring no additional args remain lsmem may parse a handful of /sys/mem/ devicefiles including per-RAM ones & outputs a summary.

Or lsmem may parse given columns with fallbacks, initializes/configures a new LibSmartCols table with those columns, configures its own “splitting” config, reads that same info in, optionally populates the LibSmartCols table with that info & displays it, optionally outputs a summary, & cleans up. The initial codepath was a shortcut bypassing LibSmartCols. The split config is taken into account when reading in RAM devicefiles.

After initializing signal handlers + internationalization & parsing commandline flags validating 2 args remain ldattr opens the given devicefile ensuring its a pseudytty (implements job control) retrieves its configuration, alters the given properties of that configuration, sets it, if a -c flag is given that is written to the devicefile, runs TIOCSETD IOCTL possibly by GSM0710-specific IOCTLs, in debug-mode daemonizes, & possibly sleeps sleeps until interrupt.

After initializing internationalization & parsing several commandline flags some into a new “loop context” losetup initializes path & sysfs debug options, normalizes the options including fallback columns taking into additional commandline args, & depending on the given “action” eithers:

Relying heavily on a mini-shared library abstracting loopback devicefiles, which is where the “loopcontext” comes from.

After initializing internationalization + LibSmartCols & parsing/normalizing several commandline flags lsipc initializes a new LibSmartCols table with given columns, retrieves specified IPC data via shared library discussed below reformatting it or summaries into LibSmartCols rows, & outputs the table. Different IPC mediums are handled seperately, though sharing utils for summary, time, & account data. Has special codepaths for “pretty” tables.

After initializing internationalization & parsing/normalizing commandline flags + bug options partially into some new linkedlists lsns initializes LibSmartCols & an ID cache, considers opening a NETLINK socket and/or loading the LibMount table, iterates over /proc virtual directory reading in a linkedlist of data to display, groups those into namespaces, possibly postprocesses & necessarily sorts that namespaces list, initializes/configures LibSmartCols table populating in one of 2 slightly different ways, & displays it. The NETLINK, wrapped in the ID cache, socket is used to retrieve namespace names for IDs.

After initializing internationalization & parsing numerous commandline flags unshare ignoring SIGCHLD signals may:

  1. Stat /proc/*/ns/mnt, fork a subprocess validating that file hasn’t changed, & mount all the namespacing virtual filesystems
  2. Run newuidmap or newgidmap in a subprocess
  3. Necessarily runs unshare syscall
  4. Write a special byte to child & wait for it to exit
  5. Write to /proc/self/timens_offsets
  6. Fork a subprocess with blocked signalhandlers
  7. Runs PR_SET_PDEATHSIG PRCTL and/or kill sycall
  8. Write to /proc/self/uid_map
  9. Write to /proc/self/setgroups & /proc/self/gidmap
  10. Mount a “none” filesystem to root
  11. Call chroot syscall
  12. Call chdir syscall
  13. Mount a “none” filesystem at specified directory possibly twice
  14. Call setgroups & setgid syscall
  15. Call capget & capset syscalls alongside PR_CAP_AMBIENT PRCTLs.
  16. Run a given subcommand.

Or runs a subshell.

Several of Util Linux’s commands relates to facilities for processes to communicate between themselves.

After initializing internationalization & parsing/validating few commandline flags ipcmk may call the shmget, msgget, and/or semget syscalls with random data reporting any successes or failures.

After initialization internationalization & parsing (handling deprecated syntax specially) commandline flags during which it calls the IPC_RMID op via shmctl, msgctl, or semctl syscalls ipcrm might (given -a flag) shmctl(SHM_INFO), semctl(SEM_INFO), and/or msgctl(MSG_INFO) syscalls for a list of IDs to remove once filtered by corresponding STAT syscall.

After initializing internationalization & parsing/validating commandline flags ipcs may either:

  1. Parse sysfs devicefiles for given IPC medium into linkedlists (or call its INFO syscall) & reformat into human legible text in a inclusive choice of 3 codepaths, sharing code with lsipc.
  2. Reads in data regarding resource limits (via sysfs) or status (via msgctl(MSG_INFO) syscall) for info to output.
  3. outputs a corresponding tableheader, parses in msg info like (3) did & outputs the requested properties of it.

The (2) & (3) options have seperate codepaths for Message queues vs shared memory vs semaphores largely shared between them.

After initializing internationalization & parsing few commandline flags validating 1 arg remains fsfreeze opens the specified directory to call the FIFREEZE or FITHAW IOCTLs on it.

After initializing internationalization & parsing/validating commandline flags the fstrim daemon (somes with Systemd initscripts) may initialize LibMount & path debug options, & iterates over & opens each given & existing mount tablefile. For each it parses it via LibMount validating it isn’t empty, has LibMount perform a deduplication, ensures there’s a root filesystem, iterates over mounts to remove unwanted ones, performs another deduplication, iterates over the mount table again to open those directories & perform the FITRIM IOCTL with verbose output on them before cleaning up.

Otherwise (if -a, -A, or -I flags weren’t given) it validates the given path is to a directory & opens it to perform that FITRIM IOCTL.

After initializing internationalization & parsing/validating commandline flags flock opens given file, may configure a timeout flag, may retrieve monotonic time for human-legible output, calls the flock syscall with error handling, possibly cancels that timer, possibly outputs status, & possibly executes given subcommand with or without forking.

After initializing internationalization & parsing/validating few commandline flags choom opens the given procfs devicedir, & reads or writes a choice of files within it reformatting as human-legible text.

After initializing internationalization & parsing minimal commandline flags ctrlaltdel either reads /proc/sys/kernel/ctrl-alt-del into human output or runs appropriate variation of the reboot syscall.

After initialization & parsing commandline flags with default values read from sysfs and preallocated for CPU count validating at most one additional arg remains chcpu may before cleaning up either:

After initializing i18n & parsing commandline flags validating more than 1 additional args remain blkdiscard opens & fstat-validates the given devicefile, running BLKGETSIZE64 then BLKSSZGET IOCTLs, validates given parameters against these values, if compiled against LibBlkID run its probe operation to detect whether data will be lost, retrieves the monotonic time via clock_gettime, iterates over trailing given args, outputs summary statistics, & closes the file. For each trailing arg blkdiscard runs the BLKZEROOUT, BLKSECDISCARD, or BLKDISCARD IOCTLs before gathering statistics & possibly outputting them once per second.

After initializing internationalization & parsing/validating commandline flags validating a single arg remains fallocate with the given devicefile open either:

After initializing internationalization & parsing commandline flags + path debug options + /sys/devices/system/memory pseudodir validating a single arg remains chmem reads sysfs “memory0/valid_zones”, converts given zone name to ID number, & before cleaning up either:

After initializing internationalization & parsing few commandline flags handling first arg specially validating 1 additional arg remains blkzone either:

A lookup table determines which variant of these 2 codepaths to take.

Kernelspace is a hostile programming environment for several reasons, not the least of which is that kernelspace lacks a full I/O stack (afterall its their job to provide a large of chunk of it!). This in turn makes printf-debugging challenging without some effort. The dmesg command combined with an in-kernel ringbuffer addresses this!

After initializing internationalization & parsing/normalizing several commandline flags validating no args remain dmesg possibly forks a pager (not cat, default less) to pipe into before either:

klogctl is a special syscall specifically for this ringbuffer.

I believe there’s a daemon which can copy from /dev/kmsg to the more persistant Syslog, hence why there’s options for dmesg to read from Syslog instead. The lack of persistance in /dev/kmsg would be due to Linux not wanting to debug their debugging utility, for it to fail on them when they need it most!

Most of the effort involved is in reformatting Syslog records.

Its getting less common now in the persuit of thinness (or is that an excuse?), but it can be very convenient to plug external storage into your computer which can be moved to another device. Not too long ago this was often a shiny plastic disk, but I’m not complaining about USB sticks taking over! To safely remove these there’s eject.

After initializing internationalization & parsing several commandline flags ensuring a single arg remainseject may output “default device: /dev/cdrom”.

Otherwise eject fills in a fallback device of /dev/cdrom’s mountpath via LibMount or locates the given devicefile & its mountpath complaining upon failure, outputs the result, consults LibMount to find its mountpoint outputting whether it was mounted, consults sysfs to get the devicefile, consults sysfs for whether it can be ejected, shortcircuits upon -n flag, or upon -i opens the devicefile & runs CDROM_LOCKDOOR IOCTL, or opens the devicefile to run the CDROM_SET/CLEAR_OPTIONS IOCTL given CDO_AUTO_EJECT outputting whether enabling or disabling, or opens the device file to run CDROMCLOSE[TRAY] then CDROM_SELECT_SPEED, or checks CDROM_DRIVE_STATUS to decede to whether to eject or closetray first, or consults /proc/cdrom_info to output available speeds, or maybe runs CDROM_SELECT_SPEED followed by maybe mounting the storage device, maybe runs CDROM_SELECT_DISC or CDROMLOADFROMSLOT IOCTLs followed CDROM_SELECT_SPEED, or maybe enables all options if none were set then opens the devicefile, ejects using the selected choice of IOCTLs, complains if all failed, & cleans up.

To keep track of the time even while off our computers come with an always-on (powered by an internal watch battery) periodically-incrementing counter circuit. This hardware can be configured via the kernel & the sophisticated hwclock command!

After retrieving its start time, initializing LibAudit if compiled against it, initializing internationalization, & parsing/validating several commandline flags validating no args remain hwclock may parse some options open the first “RTC” devicefile it can find or the given devicefile to call the RTC_PARAM_SET/GET IOCTL on it exitting immediately. Otherwise it may call the RTC_EPOCH_READ IOCTL possibly followed by RTC_EPOCH_SET` reporting result.

Otherwise it may output the current time & hardcoded version number, determines whether to use the CMOS or RTC logic possibly outputting which or an error, opens its adjust filename to read the adjustment in & possibly output it, checks whether the data its read indicates the clock is in UTC, possibly tweaks some flags, possibly calculates an appropriate adjustment & displays what the result of applying it would be, or running settimeofday syscall, fails upon insufficient permissions, possibly runs the synchronize_to_clock_tick operation on the CMOS or RTC methodtables then read the hardware clock & calculate adjustments to the current time, maybe exit displaying current time, then either:

The clocks adjustment file may than be written to.

The trouble with synchronizing clocks is time passes while you’re synchronizing…

The CMOS clock directly interfaces via <sys/io.h> to a CMOS socket which you can plug a clock into.

A relatively massive ammount of effort goes into parsing the given date via YACC, yielding a very flexible format.

hwclock may call RTC clock routines directly or via a methodtable. It operates via IOCTLs on /dev/rtc0, /dev/rtc, /dev/misc/rtc, or possibly /dev/[misc/]efirtc.

Terminal Utilities

“Terminal” here means the grid of text commandline tools render to, and these commands alter how that grid of text is displayed!

After initializing internationalization & parsing near-minimal commandline flags mesg checks which of stdin, stdout, or stderr is hooked up to a terminal, retrieves with fallback its name/filepath stored in-kernel, if no args stats that terminal to check its S_IWGRP or S_IWOTH bitflags, otherwise opens & stats the file branching upon 1st arg. If the 1st mesg commandline arg past flags is affirmative it sets the S_IWGRP & compiletime-maybe S_IWOTH, if its negative it clears those bitflags. Otherwise it complains.

After initializing internationalization & parsing/normalizing commandline flags scriptreplay opens the giving “timing” file validating its filetype, opens a handful of logfiles alongside it, configures type & crmode properties, maybe sets delay_div and/or delay_max, sets minimum delay, temporarily sets the ISIG bitflag on the terminal, repeatedly parses replayscript opcodes (including via seek GOTOs) sleeping for desired ammount of time then copying the specified data to the terminal. & cleans up whilst reporting any errors it has encountered.

After initializing internationalization + timers & parsing few commandline flags + replay debug options scriptlive configures a replayscript object just like scriptreplay sharing some code, runs your shell in a new (via mini shared library) pseudo-tty device, ensures stdout is flushed before forking the subprocess & executing the shell, evaluates the script like scriptreplay does, loops until a signal or timeout is received checking which it was, waits for subprocess to end, & cleans up with some possible verbosity.

After initializing internationalization & parsing minimal commandline flags write retrieves terminal name & branches over argcount.

write with 2 args parses a build-specified UTMP file checking corresponding devicefiles for which one to write to, with error reporting. Then performs the core logic of carefully writing who’s sending this message & the incoming text to that devicefile.

write with 3 args opens the given devicefile with validation including against that UTMP file before running the same core logic.

Otherwise complains.

After initializing internationalization & parsing commandline flags wall validates the given file exists, retrieves maximum line length, generates a message to output (including (a) possibly info on who’s sending when/where, a seperator, text copied from commandline args or stdin, & a 2nd seperator), then iterates over each terminal filtered by active users in optionally-given group calling a shared function on each.

That shared function (is it shared? Why’s in a seperate file?) opens the given devicefile to repeatedly & carefully call writev on it until the message is fully written.

wall embeds utilities for handling groups & buffering the text to be written.

After initializing internationalization & parsing numerous commandline flags setterm fills in a fallback terminal name from the $TERM envvar, calls LibCurses’ setupterm with error reporting & checking the previous name to see if it was a virtual console (which nowadays it would be!), then it outputs a broad choice of controlcodes (lookedup from LibCurses), fcntls, TIOCLINUX IOCTLs, & klogctls. Including save state to file.

After initializing internationalization & parsing/normalizing a several commandline args into an on-stack structure + debug options script retrieves your configured shell, creates & configures a pseudoteletype devicefile, possibly outputs status info compiletime-possibly calling into LibUTempter, forks with error reporting a subprocess in which to run the shell, configures the subprocess as a child of this pseudoteletype, opens all given logfiles & writes metadata into it, … possibly writes yet more metadata to the logfiles., runs a mainloop in which it proxies & logs I/O (seperate source code file), waits for the child to exit, & cleans up with yet more logging. That I/O logging is done via callbacks (where most of the code here is) with the aid of another sourcecode file shared with other terminal commands, supporting a choice of formats & incorporating timestamps.

After initializing internationalization + signal handlers, compiletime-possibly opening a debugfile, & parsing/normalizing numerous commandline args agetty compiletime-may iterate over the UTMP file to append an appropriately-altered line to it, possibly sleeps for a configured time, carefully opens & configures a new teletype devicefile, reconfigures signal handlers, configures process groups, compiletime-possibly interfaces to Plymouth bootscreens, configures locale, configures speed, retrieves the terminal’s windowsize & possibly updates it, flushes updates, gathers tweaked attributes & sets them, unsets O_NONBLOCK bitflag, writes an initstring to stdout, possibly unsets O_NONBLOCK again, possibly configures the connection speed again, possibly configures a timeout for this process, waits for a newline, possibly reads options from a configuration file maybe asking whether to login, determines the username to use, disables the timeout, configures various properties or resetting them, outputs a newline, reconfigures signal handlers, validates username, sends the username, changes the root & current directories, changes the scheduling priority, cleans up, & executes subcommand.

agetty is a bootup command configuring your shell when SSH in or possibly when you turn on your computer (given you haven’t installed a graphical desktop).

Text Utilities

Possibly less inline with their Linux-specific focus Util Linux provides a suite of text utilities.

After initializing internationalization & parsing minimal commandline flags line disables stdin buffering, repeatedly reads a char until it reaches EOF or newline echoing them to stdout & outputting a final newline.

After initializing internationalization, configuring signal handlers, & parsing minimal commanline flags rev allocates a buffer & repeatedlies opens each trailing commandline & carefully reads each line into the buffer reversing them (via an iteration over half its chars) for output.

After initializing internationalization & parsing minimal commandline flags colrm parses a couple of integral args, runs a mainloop, & flushes results. This mainloop repeatedlies reads chars from stdin up to a given width per line, fills in gaps with spaces, reads the given number of chars without echoing, & echoes rest of line.

After initializing internationalization & parsing few commandline flags with some preprocessing colcrt iterates over the trailing commandline args. With each given file opened in turn falling back to stdin for no additional commandline args repeatedly reads chars counting column numbers. If we reach 132 columns colcrt splits lines. Upon reading escape it outputs nils & spaces into the buffer. Upon EOF it outputs the lines & exits. Upon newline it outputs the lines & resets column counter. Upon tab it outputs desired number of spaces. Underscores becomes underlines. Otherwise its echoed into buffers.

Trims trailing space in output.

After initializing internationalization & parsing commandline flags into a newly-allocated struct as format strings (with decomposing loop per format string) hexdump repeatedly parses each desired blocksize, normalizes each formatting option via nested loop & extra postprocessing loop, carefully opening the first given file as stdin skipping a given number of bytes, runs the display logic, & cleans up.

This display logic involves repeatedly reading a blocksize from stdin iterating over to next opening file given in next commandline arg once we’re yielding the final block for the current file & marking outputting An asterisk line for duplicate blocks. For each read block hexdump iterates over the format strings & their components skipping bitflagged to be ignored. Possibly tweaks the format string to add padding & outputs each appropriate field in appropriate field with aid of printf.

Within that display logic is code to colourize each field, & a final loop to terminate the colourization. As well as text escaping in 2 formats. A lot of effort goes into parsing hexdump commandline flags.

hexdump is split accross 4 source files + a header file.

After initializing internationalization & parsing commandline flags ensuring no args remain col repeatedly reads each Unicode char from stdin correcting Unicode errors & various other things, flushes reformatted data, & cleans up. Per character it may track line/column for layout chars, & tracks a linkedlist of lines taking into account character widths it occasionally outputs with significant reformatting.

The best I can make sense of col is that it is a partial terminal emulator outputting the final textgrid?

After initializing internationalization & parsing/validating several commandline flags column reads stdin or each given input, consider a couple fastpaths including immediate exit, & branches upon the mode. This reading involves iterating over each line in the file stripping surrounding whitespace, adding each to a LibSmartCols table or array depending on mode.

In table mode column postprocess the LibSmartCols table to apply various column-flags including a tree structure & sorting, then outputs the table.

In the fillcols mode it computes some layout parameters & iterates over the rows then within innerloop columns the gathered chars.

In the fillrows mode it iterates over the rows outputting their text followed by a newline or tabs.

In simple mode it Iterates over & outputs each entry with added newlines.

After initializing internationalization + signal handlers & parsing few commandline flags ul configures the terminal (via some LibCurses variant) from $TERM envvar with error reporting, configures various formatting controlcodes, allocates buffering, filters stdout or each given file, & cleans up. This filtering involves reading each char tracking column & mode reallocating line buffers as necessary, & flushing the buffered line when desired. Splitting overlines, super, & sub into own lines.

After initializing i18n & (with preprocessing whilst treating the page symlink specially) parsing commandline flags including from the $MORE envvar more disables SIGCHLD signals, configures the terminal via a LibCurses variant, initializes LibMagic if linked against it, allocates a line buffer, computes layout info, configures signal handlers to be polled, outputs a screenful of text possibly reprocessed, mainloops, & cleans up.

That mainloop loops over the given files carefully opening them with error reporting & possibly filetype sniffing. Beyond that (as for the initial screenfull of output) more’s mainloop skips the given number of lines, possibly compiles & runs the given regexp over the file to determine where to begin, & possibly polls the input to adjust more’s state for different keypresses.

Based on this it outputs a char to go-home or clear-screen, to erase to a given column, erase a line, “:” seperator, and/or the filename; before updating lines-per-page & outputting a new screenful. Other processed-line by processed-line (whilst handling trailing chars) or bufferfulls at a time.

After initializing internationalization + signal handlers + terminal & parsing commandline flags than some additional args pg verbosely iterates over its trailing args with prompts & headers, or applies it core logic to stdin instead.

Outside the terminal this pg core logic merely copies the file to stdout.

In the terminal pg’s mainloop seeks to the input start switching to a tmpfile on failure, possibly compiles a given regexp, maybe reads each line with error handling copying data to stdout saving a bitflag for whether this branch was taken, maybe executes that regexp forwards, maybe clears the screen before possibly outputting the read data, & maybe prompts for user input to respond to.

Outside the terminal this pg core logic merely copies the file to stdout.

In the terminal pg’s mainloop seeks to the input start switching to a tmpfile on failure, possibly compiles a given regexp, maybe reads each line with error handling copying data to stdout saving a bitflag for whether this branch was taken, maybe executes that regexp forwards, maybe clears the screen before possibly outputting the read data, & maybe prompts for user input to respond to.

In response to user input pg may amongst other things compile a new regexp, evaluate it backwards over the file via seeking, save the file, update state for next iteration, fork out to the shell, output metadata, or exit.

There’s a meagre reimplementation of LibReadLine here…


LibAttr provides APIs & commands exposing some of Linux’s filesystems’ support for attaching additional metadata to files. To my knowledge this is most frequently used to attach MIMEtypes.

LibAttr includes a LibMisc sublibrary providing:

As for LibAttr itself…

LibAttr bundles a couple commandline programs so you can readily access these features in the terminal.

After initializing internationalization & parsing commandline flags checking a single argument remains, depending on which flag was set attr may:

After initializing internationalization & parsing commandline flags validating more args remain & that the given regexp compiles, getfattr traverses the given directories. For each file it may iterate over each sorted & regexp-filtered attr from the listxattr syscall, or query a given attr.

The queried attribute value for each directly or indirectly given file is outputted in a selection of formats (including escape formats), with or without stripping preceding slashes.

After initializing internationalization & parsing commandline flags, for each given filepath decodes (in a selection of formats) the given value & calls either setxattr or removexattr syscalls outputting any errors.

The -B flag instead parses each line in a file to determine which attrs to set.

Ext4FS Extended Attributes

Yesterday I described userspace libraries & commands for accessing additional metadata attached to files.

To do so it mainly calls additional syscalls Linux provides, and those syscalls wrap methods on the appropriate filesystem with access control, copying between virtual memories, mutexes, & data lookups. Each filesystem can provide multiple methodtables for different key prefixes.

Looking at Ext4’s implementation, the different methodtables mostly forms additional access control.

Retrieving an extended-attribute on Ext4 filesystems locates the appropriate “inode”datastructure & it’s header off the disk. Then it validates a checksum & the xattr b_data array before & caches in a doubly-linkedlist. Then it iterates over the array looking for the given key & copies the value to the appropriate destination. All of which is protected by validation & a mutex.

Similar code copies the attributes list ultimately to userspace, consulting the xattr methodtables array.

As for setting extended-attributes in Ext4FS is complicated by the need to memory-manage diskspace, aided by a “credits” system to ensure data’s nicely distributed. As well as logging changes to aid crash recovery, to a significantly lesser degree. This code is also fairly similar to the other 2 methods, with extensive memory management added.

Deletion presumably uses that same codepath.

In short: In Ext4FS xattrs are in an array on each file, but kernelspace is a hostile dev environment.1``


Here I’ll study LibACL, focusing on where it actually applies it’s doubly-linkedlists of permissions. Which can be serialized to a bytearray in a couple different ways.

The core logic appears to be one of those binary serializers, with a wrapper storing it in a extended-attribute. One which filesystems might treat specially.

There’s also a routine for parsing the ACL off an extended attribute, or wrappers for files.

There’s an ACL validator, iterator, sorting, calculating mask properties, & so much more upon the doubly-linkedlist. For brevity I’m largely ignoring this side.

There’s wrappers for transferring ACLs between files. With support routines approximating converting between ACLs & more traditional UNIX permissions “modes”.

The bulk of LibACL’s logic is in parsing & serializing between doubly-linkedlist ACLs & text. That is, this is where the ACL syntax is defined.

Commandline tools

LibACL has commandline tools exposing these kernel features to the terminal.

After initializing i18n & parsing commandline flags validating at least a couple (depending on flags) args remain chacl selects a codepath. -l iterates over the args outputting their “access” & “default” ACLs as concise text.

The -R, -D, or -B flags iterates over the commandline flags mutating the ACL on each deleting all entries for it’s “access” (unless -D) and/or “default” (unless -R) ACL. Except of any of the previous or -d it parses & validates the ACL expression given on the commandline. For -b or -d it parses & validates a 2nd “destination” ACL. Then it iterates over commandline args applying those ACLs, recursively over directories or not.

After initializing internationalization then parsing & validating commandline flags checking the $POSIXLY_CORRECT envvar getfacl iterates over remaining commandline args. For each it reads a filepath from stdin or the arg & (using utils shared with LibAttr) traverses those folders.

For each file after validation it carefully retrieves the ACLs, does some configurable preprocessing/filtering, & outputs results in a selection of text formats. Tabular mode has several support routines.

After initializing internationalizing & parsing commandline flags checking $POSIXLY_CORRECT setfacl meanwhile iterates over commandline args & their files like for getfacl (though this may also happen whilst parsing commandline flags) for each running evaluating the parsed commands as described at the start of this thread.

Also during flag parsing it may carefully & repeatedly read an ACL file twice (why twice?) to evaluate those commands before applying more generic UNIX permissions.

getfacl iterates (in a seperate file) over each command (in a linkedlist) parsed from the commandline flags or elsewhere. The parsing & managing of this commandlist is also defined in seperate files. Before postprocessing the commands to compute to form an Access Control List, validating those results if any with nice error messages, & applying any which have changed.

A support routine for this process is provided carefully wrapping acl_get_file. Or acl_copy_entry. Other iterates over the ACL looking for particular entries. Prior to applying changed ACLs, it might instead textually output them.

For each command it may…


When you sudo a command, there’s multiple privileges you might be requesting. To minimize your need to switch to an all-powerful account Linux supports granting these constituent privileges (or “capabilities”) to other accounts. LibCap conveniently this functionality to userspace. Including providing a Go module.

LibCap Go

This Go module incluldes a validator for a ProcAttr’s system & for juggling mutexes, at it’s simplest.

There’s an enum of the different privileges which may be claimed, extensively documented.

There’s a parser & (via histograms) serializer describing privileges textually.

There’s a bitmask submodule.

It wraps a system-call library & the previously-mentioned mutex juggling submodule to expose various systemcalls.

There’s a testsuite.

There’s a binary parser & serializer from/to a given file. Or a file’s attributes.

LibCap for Go also defines a launcher object which performs a blocking fork&exec.

There’s another textual parser & serializer; tracking the inheritable, ambient, & bound privileges under a mutex. With corresponding binary ones.

And finally there’s more functions exposing syscalls, accessor those privilege components to/from kernel-space, with corresponding datastructures.

tl;dr binary & textual serializers around bitmasks & privileges accessor syscalls.

LibCap Go Commands

Here I’ll finish studying the Go-centric aspects of LibCap with a bunch of commands!

psaux-signals wraps the prSetKeepCaps PRCTL syscall in a loop before pausing to wait for interrupts. psx-fd wraps the same PRCTL syscall with a pipe (no loop or pause).

try-launching runs a testsuite of commands to ensure LibCap Go’s launcher object works correctly. ok is one of those programs, a noop.

mismatch wraps SYS_GETTID syscall with standard out.

There’s a program which converts a plaintext file of names to a Go dataheader. Another extensively describes the differences between the current capabilities & the expected one in human-readable text. To aid compatability testing.

And there’s a couple regression tests.

Alongside these is a HTML file linking & redirecting to the docs. And a shell-script checking whether the Go syscall library supports AllThreadsSyscall.

Amongst the more complex of Go programs…

There’s an HTTP getter wrapper.

There’s a bpftrace wrapper reformatting the output.

setid on startup & close reads the /proc filesystem task & status files outputting to stdout & with or without exitting upon validation failures. Between that it retrieves user & group IDs before making the syscalls to set them in one of 2 codepaths.

gowns retrieves user & group IDs whilst parsing commandline flags before extensively wrapping the launcher API before for the subprocess to end.

Finally captree reads the /proc filesystem extracting then sorting the process tree. Which it then iterates over to read & output relevant process capabilities, each followed by any of it’s threads with differring capabilities.

LibCap C

LibCap predominantly includes a C library, including a few trivial executables serving mainly to shout “THIS IS A LIBRARY!” with the data it knows. As well build files, including a script for rephrasing macros listing capability names. And ofcourse there’s headerfiles, mainly defining magic numbers.

There’s a testsuite.

There’s a bitmask implementation. Memory management for the different objects are implemented seperately.

There’s accessors on capability objects alongside functions to read/write these objects from/to file attributes. Though mostly it operates on the unparsed data.

There’s Inherit-Ambient-Bound objects with their own comparators, accessors, & converters.

Textual parsers & serializers are implemented seperately.

And there’s a Launcher object for configuring the capabilities a forked subprocess using all the appropriate syscalls. Alongside accessors.

That’s all very similar to Go’s reimplementation.

Another subsystem (“kdebug”) provides files, shell-scripts, & a C command to sleep for 3s before exitting; for a minimal QEmu test environment.

LibCap Commands

Reading over the rest of LibCap…

There’s a few test commands including a noop. And ones iterating over test data. And checking for security regressions.

A “PSX” sublibrary language-binds the concept of systemcalls over from C to Go, whilst juggling Go’s threading. With testsuites.

There’s a handful of commands, including:

These have a test shellscript, and a C file holding documentation strings.

There’s a sample sudo config file declaring which accounts have which privileges.

There’s a command wrapping pam_sm_authenticate for testing.

There’s a library to be used to test how it’s dynamically linked.

There’s a PAM dynamically-loaded extension implementing capability inheritance, with it’s own testsuite.


Looking into the kernel-side of what LibCap exposes, the syscall copies & validates (mostly checking a magic number) the data over to kernel-space & dispatches to the authorization engines to act on the setter.

The “Common Cap” authorizer hooks callbacks up to most Linux syscalls to check the bitflags provided by LibCap, and ofcourse provides accessors (as long as it’s a subset of enabled privileges). This also checks extended attributes for privileges.

Process Utilities psmisc

I’ll study some of the commands The PSMisc project provides.

After initializing internationalization & parsing a couple commandline flags validating more args remain prtstat checks that /proc/self/stat exists, retrieves sysconf(_SC_CLK_TCK), & iterates over those commandline args parsing them as ints. For each it opens the stat file for that process ID pasing it using sscanf & printing it nicely in one of 2 formats before freeing memory.

After validating at least 2 commandline args are given & regexp-parsing commandline args pslog opens that process’s fd pseudo-directory to output all the .log files it has open.

After validating at least 2 commandline args are given socket_test opens a UNIX socket, binds to the given address before running the listen & accept systemcalls on that socket.

There’s a support library lightly-abstracted stat syscalls.

After initializing internationalization & parsing commandline flags followed by the remaining args (parsed as integers, first is special) peekfd ptrace attaches to the first given process-ID (with or without attaching to its threads listed in the process’s task pseudodir) counting number of sucesses & reporting failures. If any succeed it procedes to register a signal handler to detach processes, runs ptrace(PTRACE_SYSCALL) for each attached process, & runs a mainloop.

Each iteration of peekfd’s mainloop involves retrieving relevent info dumping on the OS & CPU whilst attaching to any new threads. If it’s a read or write syscall that triggered the syscall-breakpoint it retrieves the parameters & outputs that info with or without dropping duplicates.

There’s a shared linkedlist library.

After initializing i18n, registering some exit handlers, & parsing commandline flags fuser parses given address, reads /proc/mounts removing given one, tweaks/validates flags, locating given connections /proc/net, frees some linked-lists, it might output some debug info, scans /proc to find relevant processes, frees some Cygwin data if running on that system, scans a couple more pseudofiles, frees some linkedlists, & outputs the data it decided to keep with several variations. This command involves plenty of parsing code!

After initializing internationalization & parsing commandline flags validating more args remain killall postprocesses those commandline args, parses in the process IDs, allocates some memory, & iterates over the given PIDs. For each it opens the process’s pseudodir, performs various checks for whether it’s the one we want consulting it’s psedofiles, possibly prompts the user to confirm, & sends the configured kill signal. Then it cleans up & forces a kill of those processes.

Finally after init’ing i18n, determining which output mode to use, & parsing commandline flags pstree validates more args remain, & iterates over the /proc pseudodir parsing each process’s status pseudofile, task subdir, & cmdline` pseudofile. With a bit of post-processing to handful orphans & memory cleanup. If it found any it iterates over that data to find desired processes to flag for highlighting, simplify the tree to focus on a given PID, sorts the tree by given namespace outputting said namespaces OR directly serializing the tree OR grouping by user, & cleans up.