Digital Audio & Compression
An introduction to audio format technology and compression.
Sound in the real world is a wave of varying air pressure. It is a continuously changing signal. A microphone or similar transducer can convert this air pressure variation into an electrical signal with a voltage that varies proportional to the air pressure variation picked up by the microphone. This continuous signal is called an analog signal. The development of early microphones, amplifiers (to proportionally increase the power of the electrical signal), and speaker transducers (to convert this electrical signal back into sound) led to inventions such as the telephone and radio.
The earliest forms of recorded sound converted the electrical signal into a groove on a wax cylinder, using a sharp needle. Later, the design was improved to carve a groove into a flat disc (with a machine called a record lathe). These early devices were monophonic (or monaural), recording and playing back only a single channel of sound. In the late 1950s, two-channel stereophonic LP records were produced. Technical advances in recording, manufacturing, and playback were made over the next decade, leading to stereo LP records becoming the dominant format for music distribution.
As electronic and semiconductor technology advanced, it became possible to convert an analog sound signal into a digital representation. This analog to digital conversion was done by sampling the voltage level of the analog signal thousands of times per second, at very precise time intervals. This technique is known as Pulse Code Modulation, or PCM. The more bits used to quantify the voltage level of each sample, the greater the accuracy of the digital representation. For instance, if eight bits are used to quantify the voltage of the signal, there are only 28 = 256 possible values, but if 16 bits are used, there are 216 = 65,536 possible values. As an analog signal can vary in amplitude by any degree of precision, but a digital representation has only a fixed number of possible values, there is always some slight difference between the actual value of the analog signal and the digital representation. This is called the quantization error. Assuming the Analog to Digital Converter (ADC) chip used to do the conversion can do this accurately, a 16-bit sample has far lower quantization error than an 8-bit sample.
In addition to using enough bits for each sample to minimize error (which will manifest itself as noise), an Analog to Digital Converter must use a sampling frequency that is high enough to represent all of the frequencies in the analog signal. The Nyquist Theorem states that the ADC must sample the analog signal at a sampling frequency that is more than twice the highest frequency contained in the analog signal. To cover the full range of human hearing (generally understood as 20 Hz to 20,000 Hz), the ADC must use a sampling frequency of at least 40,000 Hz. The Compact Disc Audio format was the first consumer digital audio format, and it used 16-bit PCM quantization, sampled at 44,100 Hz. This resulted in a fantastic bandwidth of 20 to 20,000 Hz, with a signal to noise (S/N) ratio of roughly 90 dB (roughly twice that of records or compact cassette or 8-track tape formats).
The biggest advantage of digital audio over analog audio is the ability to make perfect copies of a recording. Analog audio always suffered from “generation loss,” meaning that a copy of an audio recording was always somewhat lower in quality than the original. Digital audio is stored as digital data (ones and zeros), and with the appropriate techniques to preserve and verify the integrity of the digital data, a copy can be made that is absolutely identical to the original.
Digital Audio Compression
While Pulse Code Modulation (PCM) uncompressed digital audio was a big advancement for recorded sound, a full compact disc (about 75 minutes of stereo audio) contained roughly 650 Megabytes of audio data. To make it practical to store digital audio on hard drives or flash memory devices, or to transfer through networks, digital audio compression standards were developed.
Compression standards that do not decode back to the original source data are called lossy compression, because some information is lost when the audio is encoded into that format. The idea behind these encoding standards is to reduce the bit rate by encoding only the most important information. In general, digital audio compression works by evaluating a digital audio signal in the frequency domain (as a set of frequencies, like all of the different notes being played by an orchestra at any given moment). Only the most significant frequencies are encoded, while the frequencies that are lower in amplitude (volume) are discarded. This technique is known as perceptual coding, because the most perceptible sound frequencies are encoded, while frequencies that are less likely to be heard are discarded. The amount of information discarded can be increased in order to lower the bit rate of the compressed signal, or increased at a cost of a higher bit rate.
Lossless compression reduces the file size (and bit rate) of a digital audio signal, without affecting the quality. Lossless audio compression works similarly to the data compression used to create a zip file. After data is compressed, the size of the file is smaller. When the file is decoded, the result is an exact bit-for-bit copy of the original uncompressed data.
Popular Audio Compression Standards (Codecs)
MP3: A digital audio compression standard developed by the Fraunhofer Society in 1991.
AAC: Advanced Audio Coding—A more advanced digital audio compression standard developed in 1997 by the Motion Picture Experts Group (MPEG).
Dolby Digital AC-3: A lossy digital audio compression standard developed by Dolby, based on the Modified Discrete-Cosine Transform compression algorithm. This is a mandatory codec for DVD players, and for ATSC HD broadcast television receivers.
Dolby Digital Plus: (EAC3)
Dolby TrueHD: A lossless digital audio compression standard, developed by Dolby, based on the Meridian Lossless Packing (MLP) lossless compression algorithm, which was developed by Meridian Audio.
DTS: Shorthand for the DTS Coherent Acoustics (DCA), developed by DTS, based on the Adaptive Differential Pulse-Code Modulation compression algorithm.
DTS-HD High Resolution Audio: Implemented as an extension to the DTS format, with a core DTS signal plus an extension allowing for enhanced detail up to 24 bit resolution and up to 96 kHz sampling frequency.
DTS-MA: DTS Master Audio. A compression standard developed by DTS that supports audio quality up to 24 bit, 192 kHz, lossless.
Don’t confuse spatial audio
formats like Dolby Atmos and DTS:X with audio codecs. These spatial audio formats define how audio is mixed, and for object audio channels, they define the 3D position, the size, and the diffusion of the sound object. But the audio is still encoded with one of the above audio compression standards. So although streaming services and streaming devices claim to support Dolby Atmos and DTS:X, they don’t support the lossless audio versions of these formats (which are only supported by Kaleidescape, or 4K Blu-ray Disc).