Audio For Broadcast: Latency & Delay Compensation

Latency is a fact of life in everyday broadcast production. Understanding why it happens is fundamental to establishing the right processes to bringing everything into harmony.

All 16 articles in this series are now available in our free 78 page eBook ‘Audio For Broadcast’ – download it HERE.

All articles are also available individually:

As the broadcast industry continues its migrations to remote, distributed and cloud infrastructures, latency is problematic. It is latency which is often cited as the reason why these distributed workflows are difficult to do at scale, but while latency can be challenging it is seldom insurmountable. The fact is, while keeping latency to a minimum is important, latency isn’t an audio thing at all; it’s a life thing.

Latency is mostly just physics. We deal with it constantly and most of the time we don’t even realize we’re doing it. If we’re trying to hear someone shouting on the other side of a valley or in a football stadium, we accept there will be a delay before the sound reaches us. We don’t think about it, we just deal with it.

In the same way, it is possible to deal with most latency challenges in broadcast environments. The challenge is less about the actual delays and more about recognizing where they exist. It is the time alignment of all the elements of a broadcast which is critical, and delay compensation is the negotiation employed to deal with it.

Delay Compensation

In a complex infrastructure where multiple signals are brought together from a variety of sources, these signals are easily displaced, especially when production workflows are geographically diverse. Re-synchronizing all of the elements in a broadcast is essentially about identifying the element with the worst latency, ie takes the longest to be ready to broadcast and delaying all of the other elements until it is ready.

Due to its processing time, video generally lags behind the audio, so specialized broadcast audio equipment has significant digital delay processor resources to compensate for such issues. A mixing console, for example, has multiple points to insert artificial delay into the signal path to bring audio tracks back into line with processed video. All you need to do is ascertain how much delay is required to align each source.

When a broadcast mix engineer is dealing with multiple camera and audio feeds from different places, each can take a different route and undergo different processing. A broadcast console will have input delay as well as output delay to help line up sources depending on where the latencies occur; they may be applied on individual channels, or across the entire board, and can often toggle between milliseconds, frames, or samples.

Where To Find Latency

Thankfully there are lots of delay compensation features built into broadcast equipment to help sync everything up. Latency and delay compensation go hand in hand, and understanding the causes of latency isn’t rocket science so much as it is just normal science.

While a single device might introduce latency, system latency is a combination of many things. Equipment processing speeds; diverse signal paths; distributed working; analog to digital conversion; compression codecs; multiple transports; external effects units; all these things have a cumulative effect on total system latency.

We’ve already looked at the importance of sync, and we’ve talked about how IP creates asynchronous workflows which need to be rigorously monitored, but all this can be minimized by thoughtful system design.

Let’s look in more detail at ways latency can be introduced into a broadcast infrastructure.

The Difference Between Audio & Video

Video latency is the amount of time it takes for a frame of video to transfer from a camera to wherever it is being processed. Audio latency is how long it takes for an audio signal to travel from a microphone or line-level feed to the same place.

While even analog audio has some inherent latency such as the natural acoustic delay of sound waves traveling through the air from a source to a mic, or down long copper cables, most latency is created by processing those signals.

The more processing power that is required to complete a task, the more latency we can expect. With audio and video signals, one usually has more inherent latency than the other and it’s almost always video. The reason is simple; video frames contain more data, so processing time is more labor intensive.

Analog To Digital Conversion

As we covered in part one of this series, all sound is analog and all digital systems are binary. Because digital television requires the conversion of analog sources to binary signals, all digital equipment requires analog to digital (A/D) conversion to transport the signal, as well as digital to analog (D/A) conversion for final delivery to the viewer. Both of these processes introduce latency.

The total delay will depend on the specific combination of equipment and processes in use, and while each conversion may be imperceptible to the naked ear, total system latency in complex networks can be significant.

But this too can be managed. The length of the string for each sample determines the total amount of information that can be stored for the sample, and as digital signal processing is very adaptable it means that once it is in the system, firmware can be adapted to do specific jobs.

Data Packets & IP Networks

SMPTE 2110 IP broadcast infrastructures require both video and audio media streams to be broken down into data packets, transported across the network and re-synchronized at their destination. There are various ways to manage this to fit production requirements but there is always some degree of latency involved. The Broadcast Bridge book Understanding IP Broadcast Networks is a good place to start exploring how this works in more detail.

Remote Contribution

Remote signals may use a variety of connectivity paths, from dedicated fiber to public Internet, to aggregated connectivity over a combination of Wi-Fi, satellite and cellular data. All this means that the backhaul transport links in a remote and distributed infrastructure will differ significantly to a relatively static local area network (LAN). They may not be static – they may drift depending on the route taken – and every switch hop increases latency and creates more buffering. Keeping switch hops to a minimum helps.

This is why dedicated fiber connections can make a big difference, especially over longer distances, and the reason why many broadcasters either lease or invest in dark fiber networks to enable them to work in a remote or distributed way. Although more expensive, they provide reliable and stable connectivity.

Distance, Remote Production & In-ear Monitoring

Distance is the big one, and changes in how live content is created are where the biggest challenges lie. As workflows become more geographically diverse and production teams become increasingly distributed, latency becomes more complex.

It also changes how we deal with latency, because in these environments buffering is not enough. In these environments, there is a very specific audio use-case which requires special attention. And that use case is in-ear monitoring.

In remote production workflows the ability to provide ultra-low latency in-ear monitoring is paramount. When the talent is on location and the production is somewhere else, personnel on location need to hear themselves and production comms in real time. They also might need interruptible foldback (IFB), which are mix-minus feeds from the studio consisting of a full programme mix to multiple listeners, minus their own input.

At a basic level, parties in both locations may also need to speak with each other in real time, such as when conducting an on-air interview.

Geography makes this difficult because when mixing a live event from a remote location, it takes time to move those signals around.

Let’s look at an example; imagine a sports presenter is in Tokyo for an event which is being mixed in New York. The audio is captured on a mic in Tokyo, converted to a digital signal, compressed, packetized and sent to the studio in New York to be combined with other sources. The presenter wants to hear themselves in their own ear, so the processed signal has to make the return journey from the control room, along with any additional feedback from the studio, back to Tokyo.

For in-ear applications, between five and ten milliseconds of delay is noticeable and anything over ten milliseconds is generally regarded as likely to compromise performance. This isn’t an issue in traditional outside broadcast environments because that signal processing is done in the same location as the talent – there is no round trip.

The way many broadcasters get around this issue for remote broadcasting is by employing the same workflow. Edge processing cores can be placed on location to process the on-site audio locally while providing remote control of those sources from the control room. While there is still some latency in the remote control of the core, this is generally manageable, and local monitoring audio latency is eradicated.

Dealing With It

All networks – as well as all things - have embedded latency which cannot be removed, but latency should always be front of mind.

As broadcast workflows get more complex with the introduction of cloud environments, technology vendors are constantly developing individual workflow elements which help deal with these causes before they become a problem, and identifying the causes is a good start.

Supported by

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

IP Monitoring & Diagnostics With Command Line Tools: Part 2 - Testing Remote Connections

In the previous article, we set the scene for working with the Command Line Interface (CLI) on a UNIX system. Now we will explore some techniques for performing basic tests on our network infrastructure to check for potential problems.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.