GIO GContentType

When downloading a file in Odysseus, it shows you an icon for the server-specified MIMEtype with a tooltip showing it’s human readable name. This is looked up from an XML database maintained by FreeDesktop.Org with patches optionally provided by applications.

I will be describing how GIO implements that lookup this morning, and it will my final toot thread inspired by Odysseus’s usage.

When it comes to icons it has GTK lookup the one specified in the database, falling back to:

  1. The MIMEtype with “/” replaced by “-“.
  2. The “base type” before the “/”, but with “-x-generic” appended. Or as specified by the XML database.

And if specified it’ll try the “-symbolic” variant of each of icon first.

Once the database is initialized it’ll use a binary search to look the icon up in each cache and in the parsed icon list.

To parse in this XML database, it starts by checking the MTime on the well-known directories (within $XDG_DATA_HOME/mime/ & $XDG_DATA_DIRS/mime/) to see if we need to reparse it.

If that check fails, it allocates all the in-memory tables iterates over all those same directories. For each it:

  1. mmaps the mime.cache file if it exists.
  2. Parses the “globs2” colon-seperated-values file, inserting each row into a list or trie appropriate to it’s patterns complexity. If that doesn’t exist it tries the simpler “globs” file. This is usually used to match file extensions.
  3. Parses the custom-syntax “magic” file describing text patterns to look for in the first n bytes of the file to determine the MIMEtype.
  4. Parses the “aliases” then “subclasses” space-seperated-values file to an array, for normalizing MIMEtypes.
  5. Parses the “icons” & “generic-icons” colon-seperated-values files each to their own array.

FreeDesktop.Org provides a script to generate these index files from it’s XML, using C & GLib.

To find the human readable label for a MIMEtype, it first unaliases that MIMEtype (using binary search on each mmap’d mime.cache and the alias array), before attempting a cache implemented using GHashMap.

If it’s not in the cache GIO uses GLib’s GMarkup to parse out the text within the tag with the highest-priority language (as determined by the locale environment variables) from the appropriate XML database entry. This will then be added to the cache before being returned.