GStreamer audio/video

GStreamer is the software I use (with Vocal, elementary Music, or elementary Videos as a UI) to enjoy the shows I’ve been reviewing for you all, but it’s capable of pretty much all your multimedia-processing needs.

At it’s heart is a in-program (push or pull) pipeline for processing binary (multimedia) formats. Though it also includes a dynamic type system (which I hope to cover tomorrow) and other utilities.


I’m sure VLC works very similarly to GStreamer.

Having skimmed Google’s libwebrtc I get the impression that much of GStreamer’s larger scope is necessary for a good videoconferencing experience.

I will not be covering many GStreamer codecs at all.

The hypothetical processor I designed for Rhapsode should be well-suited to this task.

GStreamer Core

GStreamer’s processing “elements” are connected by “pads” which provides methods and state tracking for:

The underlying logic mostly focuses on attaching “probes” for debugging, checking rules upon linking, and buffering events to be read later.

These pads are constructed from a “pad template” specifying a name, direction, whether it can be expected to be present, whether there can be multiple, and the (parsed) supported MIMEtypes + dynamically-typed format parameters.

Once it’s set up data is sent through the pads as timestamped “buffer(s)” which implements an array with extensive security checks, though Rust is increasingly used to move these checks to compiletime.

The events & queries are also dynamically typed.

I’ve briefly mentioned the concept of “elements” which are what communicates over those “pads”, and they serve mostly to hold pads & transition between null/ready/playing/paused states. Whilst holding pad templates, amongst other metadata, in class properties.

Loading Elements

GStreamer’s Elements are constructed by Element Factories, which integrates a Registry singleton into the process & holds the pad templates, URI schemes, & interfaces used determine elements to construct.

The Registry lists all plugins, features, Element Factories, type find factories, & device provider factories with hashmap indices to speedup feature & basename lookups, & “cookies” to determine when it needs to reload these lists. A cached list is saved to disk to speed up initialization.

This Registry is initialized alongside the rest of GStreamer by reading a binary cache format, traversing environment-variable specified dirs for dynamically loadable plugin files if they have changed, and write out an updated cache.

That binary cache is (de)serialized in two seperate source code files, with reads via mmap() & writes via a linked-list of GLib Slice objects.

A Plugin class handles loading each individual dynamically loaded plugin file.

That Plugin class in turn wraps the GModule libraries to import a dynamically-loaded library at runtime, surrounding it with licensing/compatibility checks and loading of it’s dependencies. If those dependencies aren’t available the plugin won’t load.

GStreamer can optionally load those plugins in a seperate thread as a further optimization.

Upon being loaded each Plugin will then register one or more “Feature” subclasses, usually including an Element Factory.

Extended pipelines

A GStreamer bin is an element which holds a linked-list of other elements (which iterates over in a thread-safe way whilst possibly prioritising and/or recursing the (grand)child elements), maintains common state between them, and handles various messages & requests for them.

Configuring a bin’s children is implemented in a “Child Proxy” superclass, with “Ghost Pads” proxying internal pads to the external world.

A pipeline is a subclass of bin which tracks additional state (mostly around timing) to make it fully self-contained, & exposes messages to the caller via a “bus” atomic-ringbuffer.

For efficiency (avoiding memory copies & profiling) GStreamer implements Allocator, Clock, & other shared Context objects to be propagated through the pipeline, which are implemented pretty much you’d expect. And those timestamped buffers will make use of these.

Dynamic Type System

Prominantly inside the events, messages, & requests which traverse GStreamer pipelines alongside the raw data is a dynamically-typed Structure which it needs to compare. To a lesser extent Structures also occurs within the negotiated MIMEtypes.

A structure is just a parallel array mapping from deduplicated (via a global hashmap) strings to (heavily extended) GLib Value objects. GLib Value objects holds type information rather than compile-time, and are extended with methods to compute unions, intersections, & subtractions with other methods filled-in as well.

It has variadic convenience, (de)serialization to/from strings, & type conversion methods.

GValue range types are introduced.

That mostly covers it, though it’s worth noting the Clock & Allocator types I mentioned yesterday are defined to be held in these structures, alongside Context, ToC, & spit “protection” information for later decryption.

Also additional type information can be added to an element’s properties (GObject already provides some), and a means of boxing those properties with their accessors.

Device Discovery

To discover input/output Devices (which know how to create or reconfigure a corresponding Element to communicate to/from them) applications instantiate a Device Monitor to aggregate results from the Device Providers’ Buses. These Device Providers are instantiated by Device Provider Factories provided by the loaded Plugins.

(Capitals indicate GObject classes)


Plugins can also provide a factory(s) for objects which output debugging records.


And/or they may provide a factory(s) for objects which reads the first several bytes of the input stream to determine which MIMEtype(s) it can be. These will be used by a special Element which feeds them as much data as needed to determine the singular MIMEtype, before handing off to the appropriate Element.


GStreamer also implements it’s own specially-optimized URI parser, class infrastructure, thread pools, mainloop, & object pools (for buffers) alongside a basic datamodel for audio samples/streams & it’s metadata.

Automatic pipeline construction

Instead of manually GStreamer pipelines, applications usually put most of that work on GStreamer Base’s bins.

Decode Bin

To oversimplify the decode bin starts with a typefind element & a list of codecs loaded (optionally excluding hardware optimizations) list from all the loaded plugins. Then it analyzes each pad as it’s added to the bin to read it’s MIMEtype & instantiate an appropriate element for it until a compatible output type is reached.

Additional pads are wrapped in an element to parallelise it’s decoding in a new thread, and it does take a lot to keep with the changing subpipeline.

Play Bin

For the common-case of playing back local or remote audio/video files GStreamer provides the Play Bin (version 2 described here).

This wraps a URI Decode Bin, sub-URI Decode Bin, & Input Selector elements whilst filling in missing output elements & figure out which outputs are the text (subtitles), audio, and imagery. Mostly though the code responds to configuration of various properties.

The Playbin’s default sinks are a wrapper around “auto-sink” that loads all the appropriate elements to fine-tune the format conversion.

And the play bin (version 2) uses a different decode bin implementation that instead wraps an element looked up for the appropriate URI scheme (rather than a format detector) for it’s initial element.

Pulse Audio

PulseAudio serves to cover auditory edge-cases, whilst pushing as much work as it can to the application (which can optimize it better) or the soundcard hardware. It serves to extend GStreamer’s pipeline into a central daemon which pushes as much work as it can off onto the client or sound card.

GStreamer bindings

GStreamer provides a Device Provider & elements to call the corresponding PulseAudio APIs. I’m not seeing much more to comment on so far…

Client library

The PulseAudio client meanwhile reads/writes the audio & control commands from/to a DBus socket, with it’s own “tagstruct”/”proplist” dynamic type system & mainloop.

But between GStreamer and the socket is a ringbuffer.


PulseAudio includes many “modules” it can load providing:

Each of these individually are very lightweight (or calls in an external library), and are decoupled from all the other “modules”.


Today I’m interested in describing how the PulseAudio daemon which contains all the “modules” I slowly listed yesterday works.

Those modules depend on a pulsecore module (also underlying the client library I briefly described the previous day) in order to interact with this daemon, exposing datatypes it understands.

It starts by looking for any file descriptors provided by systemd, before extensively configuring itself to run in realtime. Thereby minimizing added latency.

Next it loads/parses it’s configuration file (according to a key/target/parser table), environment variables, & commandline arguments. Relevant configurations will be copied over to the logging via global variables.

Then it initializes LibTool & DBus, and evaluates the configured command of:

Knowing that it’s now running a “daemon” or “start” command, it checks it’s configuration whilst outputting debugging info.

Then for the start command it initializes a refcounted pipe for autospawning, and writes a file indicating the daemon’s presence.

Then if configured to run as a daemon it forks with a new pipe & rechecks that file. Before setting SID, forking again, & configuring signals.

From there it configurs the PULSE_INTERNAL environment var & further sets up realtime execution.

Then additional debugging is written out, and the mainloop is initialized.

If a script is provided to the start command, it is parsed and evaluated (or a module may hook up stdin to feed commands to the same subsystem).

It finishes configuring DBus.

Then finally it notifies the init system that it has started, runs the mainloop, and cleans up on shutdown.

Serial U16550 audio driver

Upon probe (testing a new connection) it determines/validates the inputs & outputs based on the provided ID, before allocating a new “card” with it’s underlying “device”, it’s lazily-registered control device file, and procfs debugging info/files.

It’ll then procede to initialize the device-specific driver, testing if it’s present with some IO reads/writes, allocating a ringbuffer, & registering an interrupt & timer to send/receive queued audio data in that ringbuffer. Before inserting it into the list of sound devices & sending more initialization signals.

Then it allocs/inits a corresponding MIDI device with it’s substreams, with hardware acceleration.

And upon remove it just frees all relevent memory.

Upon reading that control file it copies data from a linked-queue of events into userspace. Waiting on an atomic-condition as needed.

Upon open it forwards the request to a filesystem stream, looks up hardware version numbers from memory, allocs/inits/registers the sound data, makes sure the module is loaded, and allocs/inits reader data. Which all gets freed on release.

llseek is not supported.

upon poll it waits on that atomic condition and/or returns flags for whether any events are queued.

Upon ioctl, as per usual, it copies the specified property into userspace or vice versa.

And upon fasync it uses the helper function to to trigger a synchronization primitive stored in the reader data.