A lot of, if not most (save communicating this information to humans), computing is dedicated to maintaining databases. To updating and querying various collections of data. Some of the most core databases to making your computer function at all are discussed here.
Package Management (Apt)
To assemble all the software developed by Linux, GNU, FreeDesktop.Org, GNOME, KDE, etc, etc, etc into a functioning operating system “distros” (as we often insist on calling them) create “package manager” tools for installing this software from their curated “package repositories”. Programming languages often have their own dedicated ones, which I’d consider a great idea if those were decently curated.
For this Debian has the Advanced Packaging Tool “APT”.
apt-pkg
Apt’s underlying library is predominantly apt-pkg
as described here.
There’s a pkgRecords
C++ class which converts a cached array into a package mapping indexed by ID.
There’s a pretty-printer.
There’s a routine to fetch each queued package update with user interaction. There’s a more general routine which applies various queued changes with user interaction.
There’s a datafile listing packager header keys in a standardized order.
There’s a class which parses a list of local files listing packages into an array.
There’s a dist-upgrade routines which flags all installed packages to be upgraded, checks whether to install all essential packages, reinstalls all previously installed packages to resolve conflicts, possibly flags held packages to be kept, & repairs any issues. Whilst updating a progressbar.
Another routine goes through all packages selecting which ones are interesting to upgrade. And a couple routines for upgrading packages with or without installing new ones. Another chooses one of these codepaths.
There’s a class which parses a list of index files with error reporting, & exposes results.
There’s a pkgPolicy
class which fills in various default properties, including pinnings & priorities.
There’s a class which parses & compares version numbers.
I’m not clear what the metaIndex
class is doing.
There’s routines for parsing configuration & system files to determine which packages can be installed & run on this computer.
There’s a class for tracking & reporting install progress.
There’s ofcourse some (line-by-line) parsing code in a dedicated file, and 2nd one for RFC-822 “tags” which includes seeking around the file.
There’s an abstraction around “index files”.
There’s a routine iterating over Apt’s directories deleting irrelevant files. Apt implements its own directory iterator.
The prioritized partial orderings are computed by a dedicated pkgOrderList
class. There’s a class holding & validating info parsed from the cache.
There’s a pkgPackageManager
class abstracting most of this, with methods for downloading archives for given packages, marking missing packages to be kept, flags a package & dependencies for immediate install, partial-order dependencies, a check for whether dependencies are irrelevant, check a list of dependencies for conflicts, check which packages may get broken by an install, run “configuration” over all non-configured packages, a couple methods to perform a wide choice of that “configuration” on individual packages including a couple repeated nested iterations & seeking out reverse-breakages as mentioned earlier, carefully flags packages to be removed with user-reporting, carefully flags/run removal of a package (actual removal implemented elsewhere), unarchive packages extremely carefully in a loop with version itarations before running the actual install/configuration, & another method for ordering dependencies this one recovering from failures.
There’s a class abstracting file copies, typically off CD or USB, with careful validation (involving pasing, hashing, etc) & progress reporting, with auxialiary methods to carefully compute filepaths.
The cache parser has a seperate class build its abstract syntax tree, normalizing & simplifying results. There’s an interpretor for querying the package cache. And a seperate class Aptitude-syntax.
There’s class abstracting a cachefile, and related classes.
There’s a class abstracting the configuration files away further, combining various field checks.
There’s a superclass for retrieving packages that implements user-reporting itself.
Debian supports reading packages off a USB or CD as a datasource, traversing all its directories not blocklisted by “.aptignr” treating an “i18n” directory specially. Results are scored or ignored based on keywords. Can mount & eject verbosely.
There’s a “worker” class applying configuration & running background commands specified there.
There’s a class handling various edgecases with error-recovery and verbose output, since you really want Apt not to fail on you because then your whole OS might fail on you!
There’s a class extracting various subsets from the package cache.
There’s a class providing parsing & serialization utilities for package descriptors.
There’s a class caching per-package dependency information & tracking package installation-state.
There’s a class maintaining a queue of local & (via subprocesses) remote files to open. As well as a class processing each individual item in that queue, including communicating with the subprocesses & logging.
Looking over some somewhat-auxiliary subsystems for apt-pkg
I see:
- Debugging output for the EDSP planning, which can be fed back in as input.
- Additional EDSP parsing code.
- Merging these EDSP files.
- Parsing utilities for .debian files.
- Abstraction over parsed Debian records.
- Parsing utilities for Debian records’ fields, including seperately version numbers.
- Utility for running “proxy” commands.
- Parsing SRV records.
- Parsing NetRC files.
- Running GPG.
- Render a progressbar.
- Running
dpkg
with external lockfiles. - Tar unarchiving.
- Package-list parser.
- “AR” archive parser.
- Interacting with CDs.
- More interfacing to
dpkg
. - Configuration file parser.
- Its own variation upon a standard C++ library.
apt-private
Here I see:
- A routine which forks a command to display a given file in the configured (in one of a few ways) pager. And a similar function for editting files
- A routine to retrieve seconds-since-epoch from
time
syscall or clock devicefile - Abstractions over system upgrades
- Verbosely download the package index, with caching; abstracting
aptAcquireWithTextStatus
- Scan packages for unmet dependencies
- Routine for editting the sources.list configuration file & validating results
- More .debian parsing utilities
- Regexp-fulltext-search with localized results
- Textually-output a list of packages
- Initialize libcurses-like output
- Retrieve locale, ignore SIGPIPE, check if simulated, check if called-by-script
- Cow easter-egg
- Traverse & verbosely output dependencies
- Check available freespace
- Locate the source package
- Another regex-search, & routine to choose between them
- More extensive listing of packages, with or without listing all their versions
- Verbosely run an install. deferring to a
pkgPackageManager
object for core logic - Verbosely download a file
- Comparators
- List installed locales & versions
- Class maintaining a sorted package list
- Class outputting progress messages
- Scan packages & autoremove relevant
- Retrieve changelog
- Minor variations on Apt’s cachefile & package-universe classes
- Check dependencies for a package
- Install a package’s dependencies
- Output broken packages
- Verbosely clean out the downloads directory
- Verbose wrapper over the Apt cache
- Retrieve commandline flags from configuration; wrapper also parses explicit commandline flags
- Retrieve package policy
- Couple seperate files implementing the most verbose package display
- Output various aspects of the update/install
- Verbosely run an install deferring to the cache, etc
Utilities for commandline UI.
Apt Misc.
Looking through the rest of Apt, I see:
- Commandline programs choosing codepaths & implementing the remainder of the commandline interface; namely: apt-cdrom, apt-cache, apt (combines codepaths from all these other commands), apt-config (includes shared suffix between these 2 codepaths), apt-dump-solver (single codepath), apt-extracttemplates (single codepath with support class), apt-internal-planner (single codepath with output support class), apt-internal-solver (single codepath), apt-helper, apt-get (minor suffix handling simulation mode), apt-sort (single codepath with sortable struct), & apt-mark.
- Shellscripts for reporting given mirror failures to a URL, & another to manage GPG keys.
- I generally ignore buildscripts in my descriptions, they are & should be dull…
- Debian (meta!) & systemd packaging.
- Documentation files with localizations.
- Perlscripts to manage sources.list with localization & packaging.
- FTP downloading library.
- Suite of commands to perform the actual downloads, some with headerfiles.
- Its own getaddrinfo implementation.
- Localization.
- Testsuite.
- “Distribution”-specific config.
- Test server to test failure reporting upon.
- Project-management commands.
- Normal project metadata.
Man Pages
The first package Linux From Scratch has you install during the main userspace build is “Man-Pages”. Because documentation is essential for software, can a feature truly be said to exist if noone knows how to use it?
This package mostly provides documentation for POSIX-standardized APIs/commands/etc implemented by GNU & Linux, written in a format where most lines start with a period & formatting controlcode. Organized into 8 numbered sections, each with an “intro” documentation file.
Man-Pages also provides some scripts they use (or have used?) to normalize the formatting of these man-pages, including:
- Remove “COLOPHON” header.
- Ensure a space is inserted before a function’s parentheses.
- List man-page text encodings.
- Grep for invalid formatting syntax.
- Extract (via AWK) FIXMEs into dedicated tables in the man-pages for review.
- Output FIXME table.
- Ensure function names are paired with parens. (via preprocessing step locating substitutions to apply)
- Locate repeated consecutive words.
- Another script for adding parens to function names, being careful around ones that share a name with terminal commands.
- Convert all man-pages to UTF-8 text encoding where they’re not ASCII.
- Check for given macros complaining where they don’t pair up, via AWK.
- A few Bash utility functions.
- Third utility for adding parens to function names.
These are for project management, they don’t need to be installed.
IANA
For compactness IP, TCP, & UDP all include a number denoting which protocol (or connection, but that’s outside the scope of this thread) the rest of the packet speaks. To make this more human-legible IANA (who are charged with tracking these number prescribed by the IETF) provides a couple (IP’s datafile is seperate) datafiles in both TSV & XML formats, 4 total, for you to preinstall into /etc. Extensively listing these protocols.
Naming is an interesting topic…
There’s this concept of “Zooko’s Triangle”. Between global-uniqueness, human-legibility, & decentralization of names any system can only achieve 2 (caveat: blockchains, don’t at-me…).
Today we typically embrace the global-uniqueness & human-legibility edge, though I’d certainly say the other edges are worth exploring (which GNU LibC can facilitate!) using federated systems typically centred around IANA.
IANA is by no means perfect, with the privitization of the Internet.
Pkg-Config
When writing software we rely on reusing code from the operating system & others (in freesoftware this line is fuzzy…). To reuse various “libraries” in C code there’s different flags you need to hand GCC. Pkg-Config manages a flatfile database of those flags.
Pkg-Config in turn reuses GNOME’s extended C toolbox to do so, which it vendors. For GNOME this “GLib” toolbox primarily provides a mainloop & OO bindings, for Pkg-Config it primarily provides a “keyfile” parser.
To do so Pkg-Config checks some envvars & builds a searchpath & globals (stored in GLib collections) handling Windows specially, parses commandline flags into (using GLib abstractions), possibly debugging messages of what options were selected & copying over to globals in other files, & possibly exits printing a version number or comparing that version against an expected value.
Then it inits hashtables, inserting itself & all .pc keyfiles in the path. Giving enough info to print them all.
Then it collects remaining commandline args & check the $PKG_CONFIG_LOG envvar before iterating over the reparsed commandline args. For each it consults the hashtables to determine which keyfile to parse (turns out to not use GLib’s keyfile parser) including expanding globals from their hashmap. From there it can populate relevant hashmaps & check whether it satisfies given requirements. And recurses for any transitive dependencies.
Results are collected & possibly logged.
If that logic passes, that might be all the command needs to do. Otherwise iterates of the collected packages outputting sorted vars, checking whether any are uninstalled, outputting their version numbers, outputting pkg + version numbers, outputting each of their dependencies private or not, outputting their vars, and/or outputting their flags.
Finally it may need to output a newline before exitting.
Includes comparators on version strings.