Digital audio interfaces were developed as a way of avoiding generation loss between devices.
Going way back to the beginnings of digital audio, the technology of microphones and mixing consoles was pretty good, but there was no analog recording technology that could match it. Discs would crackle, tapes would hiss and drop out and optical film would distort.
Unsurprisingly recorders were among the first successful digital audio products, either using stationary heads like analog machines, or adopting the rotary heads of video recorders. Using error correction, the numbers that were recovered from the tape were identical to the numbers recorded, and time base correction put them in the correct time frame, so the recording medium no longer had a quality.
Instead the quality was bounded at the original conversion to digital by the word length and the sampling rate. The recorder simply delayed the data.
Arriving in an analog world, the first digital audio recorders incorporated ADCs and DACs so that they could become plug-in replacements for analog recorders in existing systems, but it was soon realized that a digital tape copied via the analog domain was suffering two unnecessary conversions, resulting in a generation loss. The solution was to transfer the actual numbers between the two machines, so that the second tape would be a clone of the first.
That led to a requirement for some kind of digital interface that would allow for reliable transfer of the audio data and timing between two machines. The difficulty was that each manufacturer of hardware went about it in a slightly different way so that the signals were not compatible.
The AES/EBU digital audio interface was an attempt to standardize the process, so that equipment could be connected together irrespective of the manufacturer. The attempt succeeded and the AES/EBU interface was adopted worldwide and is still in service.
At the time of the invention of pulse code modulation (PCM) it was obvious that the original analog signals had become data that only differed from generic data by the requirement to be reproduced with a specific time base. It as always been the case that digital audio and information technology (IT) had a substantial overlap.
Fig.1 - The AES/EBU interface uses the same chips as RS-422, with transformers for precise balancing.
Today we seek to solve problems by adopting low-cost mass market IT hardware and writing appropriate software to make it go. When the AES/EBU interface was developed, that simply wasn't possible. Computers in those days (the term IT had yet to be thought up) were either not fast enough or were too expensive for everyday audio use. The overlap with digital audio would remain theoretical for a while.
The analog audio system of the 1980's relied heavily on the screened twisted pair to convey balanced signals around the place. Such cables were terminated with the three-pin Cannon XLR connector, which has the important feature of being practically indestructible. It was soon found that two channels of digital audio could be transmitted down such analog cables without a problem, so the cost of implementing digital interfacing would be reduced.
A suitable electrical interface was found by adopting the chips developed for the RS-422 data interface. Fig. 1 shows that the system uses differential signaling for noise rejection. The impedance of the cabling was the subject of confusion, which simply carried on the confusion of analog audio.
String copper wires on poles between towns and they will be found to have a characteristic impedance of 600 Ohms. The 0dB references of analog audio were all based on delivering one milliwatt into that impedance. Yet the cabling in an analog audio installation was too short to act as a transmission line and the impedance seen by the driver was essentially whatever was connected at the other end. Analog drivers had a low output impedance and receivers had high impedance so one driver could work into several loads.
This approach almost worked with digital audio. Although the frequency of digital audio is higher than analog, it is not that high and with short cables impedance matching wasn't necessary and one driver could drive multiple loads. It was only necessary to obey the rules and apply termination with long cables.
Ultimately the choice came down to using terminators as in analog video practice, or making the interface a point-to-point interface with fixed termination. The latter approach won out and now the AES/EBU interface is standardized at 110 Ohm impedance.
Fig.2 - The FM code always has a transition between bits and if the data bit is a one, there is an extra transition in the center of the bit cell.
In television systems it is possible to send an AES/EBU signal down analog video coaxial cable using appropriate impedance converting transformers, giving the system even more flexibility. There is also a consumer version of the interface intended for short distances and using unbalanced signals on coaxial cable fitted with phono plugs.
The use of transformers for true balancing mandates a DC-free channel code, and AES/EBU adopted the FM code that was already proven in the time code application. The FM code is simple and robust. Fig. 2 shows that there is always a transition between bits and if the bit is a one there is an extra transition in the center of the bit. Whatever the bit pattern, the waveform spends as much time high as it does low and so is DC-free and will pass through capacitors and transformers.
Decoding the FM signal requires a phase-locked loop in the receiver that locks to the transitions between the bits to create a time window in the center of each bit cell in which there either will or won't be a further transition. It follows that the AES/EBU interface is synchronous: the receiver has to run at exactly the same speed as the transmitter or data will be lost. Transmission is in real time at the actual sampling rate.
A second recorder making a copy from a first recorder must genlock to the timing of the AES/EBU interface so that the two machines run in step. In large systems a central accurate generator could send a muted AES/EBU signal (sometimes called digital silence) to every relevant piece of equipment in the same way that video genlocking operated.
The protocol of AES/EBU is also simple and robust and is based on subframes of 32 bits. The system alternates between two types of subframe in order to carry two audio channels. Although the two channels alternate, the samples are considered to have been taken at the same time. The channels can be told apart because they have different synchronizing patterns. A pair of subframes forms a frame and the frame rate is the same as the sampling rate.
Fig.3 shows the subframe structure. Following the synchronizing pattern there is space for the audio sample. This can have any wordlength up to 24 bits subject to the requirements that the LSB is sent first and MSB must always be placed in bit 27. Unused bits must be set to zero.
The last four bits of each subframe are status bits. The valid bit indicates the sample is suitable for conversion to analog. It is seldom set to anything but the default condition. The user bit is also seldom used. The channel status bit allows a 24-byte metadata message to be built up over 192 subframes describing the audio. A third synchronizing pattern denotes the beginning of the channel status message.
The parity bit is designed to give the subframe an even parity characteristic, which allows single bit errors to be detected, but not corrected. Detection of errors gives some warning of deterioration in the system, but does not prevent the errors being audible. In practice the real benefit of the parity system is that it ensures that the sync patterns at the beginning of the sub-frames always have the same polarity. That gives them a slightly higher probability of correct detection.
Fig.4 - In SDI having embedded audio, three video words are used to carry most of an AES/EBU subframe.
Alongside digital audio, the digitization of video was also making strides and the serial digital interface (SDI) was developed. This maintained the timing of analog video, but the lengthy sync pulses were replaced by short digital codes, which meant that there was quite a bit of time in the interface where the video signal was blanked. It was an obvious development to convey digital audio in the blanking periods.
Embedded audio data use unique synchronizing patterns, which video devices cannot see. SDI is based on ten-bit data so three video sample periods are required to carry an AES/EBU audio subframe. In SDI, certain bit patterns are reserved, whereas audio data are unconstrained, and ten bits taken from an audio sample could accidentally replicate an SDI sync word and cause havoc. The solution is that the ten bit video words contain nine audio bits and bit 9 is the inverse of bit 8.
The use of inversion ensures that a ten-bit word carrying audio data can never replicate a video sync pattern. The result is that there are 27 bits available to carry an audio subframe. Fig. 4 shows how the audio data are distributed over three 10-bit SDI words. It is only possible to carry 20-bit audio samples in the default packing system.
You might also like...
Optimization gained from transitioning to the cloud isn’t just about saving money, it also embraces improving reliability, enhancing agility and responsiveness, and providing better visibility into overall operations.
IP monitoring differs from SDI and AES due to the abstraction of the video, audio and metadata essence leading to new methods of measuring and monitoring levels and timing.
In looking back at the brief history of digital audio there are a few salient points that may help us to see where the technology may go in the future.
To see why perfect synchronization can never be achieved, we have to look at relativity, which is the physics of co-ordinate systems.
Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.