How does digital audio work?
A speaker or microphone is a fluctuating magnet(s) inside a coil(s) of wire producing or receiving sound waves. Resisters inside a digital-analog-converter can convert digital numbers into voltages for a transister to amplify. Also we can wrap a DAC with a circuit to find the digital number which matches the anolog voltage.
We tend to “sample” audio in 16bits at 44.1khz.
Getting this timing perfect is critical to avoid audio artifacts, our ears are painfully good at noticing missed samples. Usually this is achieved using one or more “ringbuffers” where new audio samples overwrite outdated samples, allowing a real-time thread to dequeue the appropriate sample right when the sound card requests it.
Any discrepency between a 16bit digital number & an analog voltage shows up as akin to tapehiss, we call it “discretization noise”. The rate at which we sample, as proven by digital circuitry pioneer Claude Shannon amongst others, is twice the maximum frequency we can store.
Lowpass filters are added to the anolog circuitry to discard frequencies the computer can’t store/process & that we can’t hear.
44.1khz was chosen because it can store up to over 20khz sounds, & easily fits on videotape.
Summing audio samples mixes them. Multiplying audio samples adjusts their volume. If we want to adjust the frequencies we either use a fast-fouriter-transform to determine which frequencies are in the audio & convert it back, or we adjust the sound’s speed necessitating resampling using some form of interpolation. TODO: How do adjust audio rate without impacting frequency?
Digital music often involves repeating the same, possibly mutated, audioclips with particular timing.
Audio CDs store uncompressed 16bit 44.1khz digital audio, with metadata & “Cross-Interleaved Reed-solomon Coding” for (another topic Shannon pioneered) error correction.
WAV are uncompressed audiofiles with a small header for metadata.
MPEG stores extensive (and typically unused) metadata about how the streams it multiplexes are encoded. Xiph’s OGG multiplexes streams without all that metadata, instead relying on magic numbers identifying each contained filetype.
FLAC can be used standalone or within OGG to losslessly compress audio by storing a cheap approximating formula for its wave, then compactly storing adjustments using rice codes. It’s audio-specific compression is extremely effective!