System Databases

A lot of, if not most (save communicating this information to humans), computing is dedicated to maintaining databases. To updating and querying various collections of data. Some of the most core databases to making your computer function at all are discussed here.

Package Management (Apt)

To assemble all the software developed by Linux, GNU, FreeDesktop.Org, GNOME, KDE, etc, etc, etc into a functioning operating system “distros” (as we often insist on calling them) create “package manager” tools for installing this software from their curated “package repositories”. Programming languages often have their own dedicated ones, which I’d consider a great idea if those were decently curated.

For this Debian has the Advanced Packaging Tool “APT”.

apt-pkg

Apt’s underlying library is predominantly apt-pkg as described here.

There’s a pkgRecords C++ class which converts a cached array into a package mapping indexed by ID.

There’s a pretty-printer.

There’s a routine to fetch each queued package update with user interaction. There’s a more general routine which applies various queued changes with user interaction.

There’s a datafile listing packager header keys in a standardized order.

There’s a class which parses a list of local files listing packages into an array.


There’s a dist-upgrade routines which flags all installed packages to be upgraded, checks whether to install all essential packages, reinstalls all previously installed packages to resolve conflicts, possibly flags held packages to be kept, & repairs any issues. Whilst updating a progressbar.

Another routine goes through all packages selecting which ones are interesting to upgrade. And a couple routines for upgrading packages with or without installing new ones. Another chooses one of these codepaths.


There’s a class which parses a list of index files with error reporting, & exposes results.

There’s a pkgPolicy class which fills in various default properties, including pinnings & priorities.

There’s a class which parses & compares version numbers.

I’m not clear what the metaIndex class is doing.

There’s routines for parsing configuration & system files to determine which packages can be installed & run on this computer.

There’s a class for tracking & reporting install progress.


There’s ofcourse some (line-by-line) parsing code in a dedicated file, and 2nd one for RFC-822 “tags” which includes seeking around the file.

There’s an abstraction around “index files”.

There’s a routine iterating over Apt’s directories deleting irrelevant files. Apt implements its own directory iterator.

The prioritized partial orderings are computed by a dedicated pkgOrderList class. There’s a class holding & validating info parsed from the cache.


There’s a pkgPackageManager class abstracting most of this, with methods for downloading archives for given packages, marking missing packages to be kept, flags a package & dependencies for immediate install, partial-order dependencies, a check for whether dependencies are irrelevant, check a list of dependencies for conflicts, check which packages may get broken by an install, run “configuration” over all non-configured packages, a couple methods to perform a wide choice of that “configuration” on individual packages including a couple repeated nested iterations & seeking out reverse-breakages as mentioned earlier, carefully flags packages to be removed with user-reporting, carefully flags/run removal of a package (actual removal implemented elsewhere), unarchive packages extremely carefully in a loop with version itarations before running the actual install/configuration, & another method for ordering dependencies this one recovering from failures.


There’s a class abstracting file copies, typically off CD or USB, with careful validation (involving pasing, hashing, etc) & progress reporting, with auxialiary methods to carefully compute filepaths.

The cache parser has a seperate class build its abstract syntax tree, normalizing & simplifying results. There’s an interpretor for querying the package cache. And a seperate class Aptitude-syntax.

There’s class abstracting a cachefile, and related classes.


There’s a class abstracting the configuration files away further, combining various field checks.

There’s a superclass for retrieving packages that implements user-reporting itself.

Debian supports reading packages off a USB or CD as a datasource, traversing all its directories not blocklisted by “.aptignr” treating an “i18n” directory specially. Results are scored or ignored based on keywords. Can mount & eject verbosely.


There’s a “worker” class applying configuration & running background commands specified there.

There’s a class handling various edgecases with error-recovery and verbose output, since you really want Apt not to fail on you because then your whole OS might fail on you!

There’s a class extracting various subsets from the package cache.

There’s a class providing parsing & serialization utilities for package descriptors.


There’s a class caching per-package dependency information & tracking package installation-state.

There’s a class maintaining a queue of local & (via subprocesses) remote files to open. As well as a class processing each individual item in that queue, including communicating with the subprocesses & logging.


Looking over some somewhat-auxiliary subsystems for apt-pkg I see:

apt-private

Here I see:

Utilities for commandline UI.

Apt Misc.

Looking through the rest of Apt, I see:

Man Pages

The first package Linux From Scratch has you install during the main userspace build is “Man-Pages”. Because documentation is essential for software, can a feature truly be said to exist if noone knows how to use it?

This package mostly provides documentation for POSIX-standardized APIs/commands/etc implemented by GNU & Linux, written in a format where most lines start with a period & formatting controlcode. Organized into 8 numbered sections, each with an “intro” documentation file.


Man-Pages also provides some scripts they use (or have used?) to normalize the formatting of these man-pages, including:

These are for project management, they don’t need to be installed.

IANA

For compactness IP, TCP, & UDP all include a number denoting which protocol (or connection, but that’s outside the scope of this thread) the rest of the packet speaks. To make this more human-legible IANA (who are charged with tracking these number prescribed by the IETF) provides a couple (IP’s datafile is seperate) datafiles in both TSV & XML formats, 4 total, for you to preinstall into /etc. Extensively listing these protocols.

Naming is an interesting topic…


There’s this concept of “Zooko’s Triangle”. Between global-uniqueness, human-legibility, & decentralization of names any system can only achieve 2 (caveat: blockchains, don’t at-me…).

Today we typically embrace the global-uniqueness & human-legibility edge, though I’d certainly say the other edges are worth exploring (which GNU LibC can facilitate!) using federated systems typically centred around IANA.

IANA is by no means perfect, with the privitization of the Internet.

Pkg-Config

When writing software we rely on reusing code from the operating system & others (in freesoftware this line is fuzzy…). To reuse various “libraries” in C code there’s different flags you need to hand GCC. Pkg-Config manages a flatfile database of those flags.

Pkg-Config in turn reuses GNOME’s extended C toolbox to do so, which it vendors. For GNOME this “GLib” toolbox primarily provides a mainloop & OO bindings, for Pkg-Config it primarily provides a “keyfile” parser.


To do so Pkg-Config checks some envvars & builds a searchpath & globals (stored in GLib collections) handling Windows specially, parses commandline flags into (using GLib abstractions), possibly debugging messages of what options were selected & copying over to globals in other files, & possibly exits printing a version number or comparing that version against an expected value.

Then it inits hashtables, inserting itself & all .pc keyfiles in the path. Giving enough info to print them all.


Then it collects remaining commandline args & check the $PKG_CONFIG_LOG envvar before iterating over the reparsed commandline args. For each it consults the hashtables to determine which keyfile to parse (turns out to not use GLib’s keyfile parser) including expanding globals from their hashmap. From there it can populate relevant hashmaps & check whether it satisfies given requirements. And recurses for any transitive dependencies.

Results are collected & possibly logged.


If that logic passes, that might be all the command needs to do. Otherwise iterates of the collected packages outputting sorted vars, checking whether any are uninstalled, outputting their version numbers, outputting pkg + version numbers, outputting each of their dependencies private or not, outputting their vars, and/or outputting their flags.

Finally it may need to output a newline before exitting.

Includes comparators on version strings.