Audio is arguably the most complex aspect of broadcast television. The human auditory systems are extremely sensitive to distortion and noise. For IT engineers to progress in broadcast television they must understand the sampling rates and formats of sound, and in this article, we delve into digital audio.
Analogue audio was prevalent in broadcast television up to the 1990’s when digital audio started to emerge in the professional arena. A combination of integrated circuit innovation, the adoption of digital audio in telecommunications, and real advantages of digital processing, drove the adoption of digital audio.
Noise, distortion, and interference are the enemies of any audio systems. Noise manifests itself as hiss. Distortion makes the audio sound “buzzy” or “crackly”. And interference can be anything from low frequency hum caused by electrical mains feeds, to high frequency spikes caused by a faulty fluorescent lighting circuit.
Digital Audio Benefits
Digitally distributing and processing audio negates many of these problems, especially interference caused by external factors. In the perfect digital audio chain, the signal is kept in its digital format wherever possible. The only time when the audio should be analogue is at the microphone and loudspeaker.
The two most common microphones used in broadcasting are moving coil and condenser, both generate analogue signals and need to be converted to digital at the earliest possible opportunity. The device to amplify and convert the microphone to a digital signal is the ADC (Analogue to Digital Converter).
ADC’s are available in various formats with an array of functionality. Single channel units will take one or two mic inputs and provide a digital output that will connect directly to the sound console. Others have up to sixty-four mic inputs and provide a time-division-multiplexed data stream called MADI (Multichannel Audio Digital Interface).
Diagram 1 – Analogue audio signals are converted to digits and synchronized as soon as possible using the master-clock to enable time division multiplexing into AES3 format.
Converting an analogue signal to digital using an ADC is also referred to as PCM (Pulse Code Modulation). The analogue audio is sampled at regular intervals and a digital number results providing data values proportional to the voltage amplitude of the audio signal.
Digital audio is normally distributed as discrete integer values, hence the reason we use PCM. However, audio processing equipment uses long floating point values to reduce the risk of concatenation errors.
Two parameters are used to describe the ADC function; sampling rate and bit depth.
Sampling rate is the number of instances per second that the audio signal is measured to provide the resulting PCM output. Harry Nyquist (1889 – 1976) was a Swedish-born American engineer who made outstanding contributions to the theories of communications. One of these became known as the “Nyquist sampling rate” and defines the minimum sampling rate that a signal can be measured at to provide PCM output with no aliasing. That is, it can be turned back into analogue without error or distortion.
Nyquist determined the minimum sampling rate to be just over twice the highest frequency being sampled. The human auditory system has a maximum frequency range of approximately 20KHz (this is an average and varies according to age and physical condition of the listener). Television assumes 20KHz as the upper limit of the human hearing range and so two rates are commonly used; 44.1KHz and 48KHz.
Diagram 2 – the top diagram shows an audio sine wave sampled in accordance with Nyquist’ theorem, the bottom diagram shows what happens when Nyquist isn’t obeyed, the blue signal will be terribly distorted and unusable.
CD’s use 44.1KHz. This was chosen as the common audio sampling rate for 30 fps (pre-NTSC color) and 25fps. However, when NTSC was broadcast, the frame rate reduced to 30/1.001 fps and 48KHz was chosen as the nearest common denominator between European and US based television systems. Using 48KHz meets the Nyquist criteria as 48KHz sample rate is greater than twice 20KHz.
Greater Bit Depth Needed
Bit depth describes the resolution of data used to define the digitally converted audio. The bit-depth is directly proportional to the level of the noise floor, a bit depth of 16 bits gives a noise floor of 96dB, 20 bits gives 120dB and 24 bits gives 145dB’s. Professional studio’s use depths of 16, 20, or 24 bits.
There is always a compromise between quality and cost of implementation. The higher the bit depth the better the sound resolutions and signal to noise ratio. However, as bit depths and sampling rates increase, so does the bandwidth required to distribute them, and the capacity needed for storage.
AES3 and MADI Distribution
The two fundamental digital distribution systems used in the audio control room of a television studio are AES3 (Audio Engineering Society) and MADI (Multichannel Audio Digital Interface). A third method uses SDI (Serial Digital Interface), where the audio is embedded into the VANC (Vertical Ancillary Data) of the SDI. But this is only used when distributing the sound and vision together outside of the studio.
MADI and AES3 are usually used to distribute digital audio within studios, whereas SDI is generally used to distribute audio with video to remote studio’s or playout centers.
AES3 describes the format, electrical layer, and physical connectivity of the standard. Two channels are defined, Channel A and Channel B. Each sample is either 16, 20, or 24 bits, and thirty-two samples make up one sub frame. Two sub-frames are formed for channel A and B, and these are combined to make 192 frames.
Diagram 3 – 48KHz sample rate was chosen for professional audio as it can be synchronized to both NTSC and PAL frame rates. NTSC is in phase every five frames and PAL every frame.
The data signals can be distributed over unbalanced, balanced, and optical cabling.
More Than Just Audio
User data is available in the auxiliary parts for the frames to facilitate distributed private-data and timecode. This allows frame accurate information to be sent with the audio samples.
One of the challenges of AES3 cable is that it is bulky and not very efficient, a 64-input desk would need between 64 and 128 cables, these take up a lot of space, are heavy and expensive.
The introduction of MADI maintained high quality 48KHz sampling and depths of up to 24 bits, but significantly increased the number of channels that can be distributed across one cable to 64. Coax and fiber optic cables are the main distribution mediums for MADI and this is particularly useful when distributing many audio channels along a single cable.
Even Higher Sampling
MADI facilitates an increase in the sampling rate to 96KHz at the expense of reducing the number of channels distributed, this is useful when exceptionally high precision is required, often when significant post processing is needed.
It’s important to note that both AES3 and MADI are synchronous distribution systems. They require their own specific networks and specialist equipment to process and distribute the audio. Taking one audio channel from an AES3 circuit and inserting it into a MADI circuit is difficult, problematic, and requires a great deal of specialist knowledge from the sound engineers operating the systems.
Furthermore, AES and MADI networks require a master pulse generator to keep the respective networks synchronous along with the terminal equipment attached to them. Failure to do so will result in lost packets and audio distortion.
Digital audio is a complex subject to master and even the smallest error or timing loss can result in lost packets leading to distortion. One of the challenges to overcome, as we move to IP, is to distribute synchronous audio and video streams over asynchronous IP networks with no packet loss.
You might also like...
NDI (Network Device Interface) is a free protocol for Video over IP, developed by NewTek. The key word is “free.”
NAB have announced the show scheduled for October 2021 has been cancelled.
Timing accuracy has been a fundamental component of broadcast infrastructures for as long as we’ve transmitted television pictures and sound. The time invariant nature of frame sampling still requires us to provide timing references with sub microsecond accuracy.
For the past year an international group of technology companies, funded by the European Union (EU), has been looking into the use of 5G technology to streamline live and studio production in the hopes of distributing more content to (and…
In the last article in this series we looked at how KVM improves control, reliability, security and integration for multiple devices and cloud systems. In this article, we look at how latency is addressed so that users have the best…