Our auditory system is incredibly sensitive to the smallest sound distortion or discontinuity. Even the slightest audio pop, stutter, or level clip grabs our attention and distracts us from the television or radio program. Consequently, vendors working in the audio space, especially in IP, have spent years refining their Audio over IP solutions to make the sound clear and distortion free, as well as easy to use.
This article was first published as part of Essential Guide: Audio Over IP Primer For Broadcast
The ratio of audio channels to video in a broadcast service can be very high. Multilingual localization, audio description, and immersive audio are all contributing to increased demand for more audio channels. Intercom further inflates this requirement, especially when we introduce clean-feeds for outside broadcasts.
Over the years, a plethora of standards have emerged to increase the number of audio channels that can be distributed over single cables. This is especially important for outside broadcast vehicles where weight restrictions place limitations on the amount of heavy copper cable that can be installed.
As the number of channels available in a cable increased, so did the complexity of the infrastructure. MADI is a prime example of this. Although 64 channels can be accommodated in a single cable, the system is time-division multiplexed and broadcast specific audio-channel embedding and de-embedding equipment is needed, and complex switching matrices are required to route the signal.
SDI is also capable of inserting audio into its transport stream but a similar embedding and de-embedding challenge manifests itself. Furthermore, system designers are restricted to sending the audio to the same destination as the video. Again, complexity and costs soon escalate due to the bespoke requirements of switching, embedding and de-embedding.
All this results in restricted operation and increasing costs.
IP helps to overcome these challenges as it is much more flexible and scalable than SDI, AES, or MADI. Although IP also uses time-division multiplexing, it assumes asynchronous operation so packets can be inserted and extracted from the transport stream without the tight tolerances imposed by a synchronous distribution system.
One of the major benefits of IP is that it is essentially a software protocol definition and assumes nothing about the underlying hardware distribution system. Many broadcasters use Ethernet to transport IP, but it is not mandatory. This is also one of IP’s greatest benefits as it is transport stream agnostic and can be distributed over many different types of physical network.
Furthermore, the IT industry has been using IP and Ethernet for over thirty years, even though the data-rates were only a few megabits for early adopters. Consequently, researchers and IT professionals have had the best part of fifty years to understand, improve, and design faster and more reliable networks.
Thousands of scientific papers currently exist demonstrating the massive amount of research that has gone into IP and Ethernet network optimization and improvement.
Table 1 – Calculations showing audio capacity of a 1Gbps Ethernet link, as Ethernet is by-directional, 1,042 audio streams can flow in each direction, this is equivalent to 16 send and 16 receive coaxial cables (32 in total) for MADI.
This has further led to vendors outside the broadcast industry designing and building improved networks and connectivity. In 1983, Ethernet speeds of 10Mbps were the norm, but now, the IEEE expect to release the 400Gbps by 2021 (IEEE 803.2cu).
AoIP started to make massive inroads into broadcasting about twenty years ago as network data throughput and reliability improved. Although uncompressed audio has a much lower bandwidth and data-rate requirement than video, even a single sample loss can be detected by the human ear, this meant networks had a very high-performance bar to achieve.
The vast majority of transport standards used with IP are bi-directional. That is, the equipment at either end can both send and receive simultaneously. This is a big change for broadcasters as video and audio systems operate in one direction with point-to-point connectivity.
Networked bi-directional signal flows open up a fantastic array of opportunity for broadcasters as they can now maintain both full duplex signal exchange and control between devices. For example, a microphone can be muted from a remote location using a control system based on a TCP/IP protocol.
A single 1Gbps ethernet link can transfer in excess of 1,000 audio channels (assuming 80% network utilization for frame and packet header overhead and preambles, and 16bit audio sampled at 48KHz).
One of the challenges AoIP pioneers had to overcome in the early days of its adoption was signal interoperability. The big advantage of synchronous networks such as MADI, AES, and SDI, is that the audio signal specification is well defined, that is, the sender and receiver know exactly what types of signals are being exchanged.
Synchronous, rigid transport streams have served the broadcast industry well for many years, but they lack flexibility and scalability. If we are to get the best out of IP, then we cannot conform to these intractable standards and must think beyond them.
If a microphone is configured to send 24bit audio at 96KHz sampling, we cannot assume the receiver knows this. If it is configured to accept 16bit audio at 48Khz sampling then the receive engine probably will not sync up to the audio, and even if it does (by chance), the resultant signal will be a cacophony of acoustic distortion and chaos.
Furthermore, how do we know how many audio channels are associated with a specific stream? In the IP world, streams are identified by their IP source and destination addresses, protocol specifier, and stream type (multicast or unicast for example).
It is possible for a broadcaster to manually configure a system. The IP addresses will be known during system installation and the associated bit depths and sample rates. However, manual configuration is an almost impossible task to maintain for any period of time. Soon after the infrastructure is built, operational requirements will demand configurations and signal routings are changed. With potentially hundreds of audio streams available on each Ethernet link, this is a daunting task with the significant potential for human error.
Audio over IP distribution has the option of using unicast or multicast. The lower data rates in audio make multiple distribution of audio streams achievable as they tend to be in the order of one or two megabits per second as opposed to several gigabits per second in uncompressed video.
Audio unicast is easier to implement than multicast and is often used in AoIP where adequate bandwidth headroom is available and predictable.
Multicast is the IP version of the distributing amplifier and is a bit-rate bandwidth efficient method of providing a one-to-many mapping of a single source to multiple destinations. Down-stream devices opt in and out of the multicast stream and the ethernet switch duplicates frames and sends them to the appropriate port.
Diagram 1 – IP multicast addresses in the range 220.127.116.11 to 18.104.22.168 are mapped into the reserved Ethernet MAC address range 01-00-5E-00-00-00 to 01-00-5E-7F-FF-FF.
Multicasting soon becomes complex and difficult to manage with a typical facility providing thousands of streams.
The good news is, the pioneers of broadcast AoIP have found a solution to managing these systems and we discuss this further in Part 2.
Synchronous distribution transport streams have the signal timing built into them. For example, AES3 embeds the sample clock into the transport layer using the “bi-phase mark” encoding method. This guarantees that the receiving device can lock to the senders’ sample and bit clock to guarantee full signal reconstruction. However, the price we pay for this is lack of flexibility and scalability as the facility is limited to a very narrow subset of audio standards.
Asynchronous distribution using Ethernet, or similar, strips away the underlying clocking system found in transport layers such as AES3, SDI, and MADI. But we still must synchronize the receivers sample clock to the senders otherwise samples will be lost or duplicated. Early audio streaming solutions used RTP (Real Time Protocol) to achieve receiver synchronization. PTP (Precision Time Protocol) was later adopted to maintain higher levels of synchronization leading to high standards of signal reconstruction.
PTP requires a master clock generator to allow send and receive devices to synchronize. As far as PTP is concerned, the connected devices such as a microphone and sound console, are slaves to the PTP master. In AoIP systems with up to one-hundred devices, local PTP clocks can be generated using one of the sound devices, so we don’t need a separate master generator.
Buffers are a fundamental requirement of any asynchronous system. The assumption is that the sender and receiver sample clocks are frequency and phase synchronous in the long term and PTP provides this for us. However, anomalies in the network can lead to packet reordering and temporal shifting leading to packet jitter. Buffers will solve packet jitter issues but they lead to excessive latency if they are too big or incorrectly used.
Again, PTP can be manually configured along with buffers but systems become incredibly complex very quickly and an automated version of connectivity better serves the facility, and we discuss solutions for this in Part 2.
Vendors working in the AoIP field have been developing solutions to address the challenges discussed here for twenty years. Consequently, they’ve had a great deal of opportunity to solve many of the challenges broadcasters face when moving to IP infrastructures. In Part 2, we investigate the tools needed to automate interoperability for seamless connectivity.
You might also like...
With the emergence of the cloud into the media production and delivery space, the broadcast and media industry must embrace an entirely new approach to acquiring and deploying technology. Large capital expenditures (CapEx) are increasingly being replaced by operating expense …
It seems almost superfluous today to specify that audio is digital because most audio capture, production and distribution today is done numerically. This was not always the case and at one time audio was primarily done without the help of…
There is level and then there is loudness. Neither can be measured absolutely, but by adopting standardized approaches it is possible to have measurements that are useful.
There are two basic reasons to know the level of an audio signal. One of these is more technical and one of them is more subjective.
OTT delivery continues to expand to meet the relentless growing consumer demand. This trend shows no chance of abating and technologists are continually looking to innovation to scale infrastructures accordingly. In this sponsors perspective, Ryan Nicometo, SVP of Product for…