How To Achieve Broadcast-Grade Latency For Live Video Streaming - Part 1

For a serious discussion about “making streaming broadcast-grade” we must address latency. Our benchmark is 5-seconds from encoder input to device, but we cannot compromise on quality. It’s not easy, and leaders in the field grapple with the trade-offs to ensure the best possible customer experience.

The consumer expectation of “broadcast-grade streaming” is a simple target to aim for – 5-second latency (or less) at a sustained (i.e., no buffering) highest bitrate for the device, and negligible bitrate ladder changes to accommodate network conditions.

Perfect bandwidth between encoder and device would deliver perfect video quality, but on its own it does not deliver broadcast-grade latency. Latency reduction needs a reengineering of how we process the video and decode it on the video player. But once we have reengineered the video processing, we then need that perfect quality to be achieved, which is harder to achieve because reducing latency removes margins for error, so now a perfect customer experience is harder to maintain. Broadcast-grade video streaming is not easy.

Therefore, we talk about “safely” reaching 8 seconds latency as a reasonable target for today, improving from the current norm of about 30-60 seconds. It is possible to reach 8 seconds with an appropriate amount of safety built into the stream, using standard HLS and DASH protocols. Beyond this, moving to the 5 second range, and onward to the 2-4 second range, there are the less-tested options related to CMAF media formats, and pioneering initiatives like HESP (High Efficiency Streaming Protocol). This 3-part article looks at what can be achieved with these approaches.

The First Low Latency Zone – 8 Seconds

In latency terms, 8 seconds is considered by experts to be a relatively safe level to reach. 8 seconds includes allowing enough time to correct stream errors so that the viewing experience is not negatively affected. This uses DASH and HLS standards and shortens the all-important segment size.

Latency in this article is broken down into components and typical timings for 6 second segments as described in the below diagram and table from the Streaming Video Alliance (SVA). 

Figure 1: Live streaming workflow and the “30-second” latency norm.

Figure 1: Live streaming workflow and the “30-second” latency norm.

Clearly, the biggest driver of latency is in the packager and player components, which are intrinsically linked. Once a segment size (i.e., a GOP, or Group of Pictures) has been defined by the packager, it will typically be multiplied by 3 to give the player a buffer window that helps it to negotiate network conditions that slow down the stream or cause it to drop packets. 3 segments of 6 seconds each is an HLS recommendation, which then adds up to our industry-famous “30 seconds” of latency.

The “safety-zone” of about 8 seconds, which is only a small way behind our customer experience target for broadcast-grade live viewing experiences, is achieved be taking two important process reengineering steps.

First, we need to reduce the size of each segment to 2 seconds. HLS and DASH both allow for this change in their specifications. This change is made at the encoder stage, but the latency benefits are really seen at the packager and player stages. The SVA note that the trade-offs between latency, start-up delay, and quality are best optimized at the 2-second segment size. Even though the HLS specification allows for 1 second segments, and the DASH specification allows for 200ms segments, the smaller segment sizes can lead to encoding efficiency problems in terms of cost increase or a reduction in quality for the same cost.

Second, we need to reduce the number of segments to 2 from the typical standard of 3. This is a simple choice that makes the biggest single improvement to latency although it is risky from a quality perspective, even with ABR options. This subject is covered in more detail later in this article.

Figure 2: An example of end-to-end latency using a two-second segment duration (DASH or HLS).

Figure 2: An example of end-to-end latency using a two-second segment duration (DASH or HLS).

Third, there are small tweaks that can be done to reengineer or optimise for further latency reduction. These can remove c. 3-4 seconds if done in isolation of the previous two points, which doesn’t get us close to broadcast grade on their own, but they may be important details to address to optimise customer experience. In most situations, they will be useful only when the first two steps have been taken. For example, if a decision is made to stick with 3 segments in the buffer, then these extra optimisations could save an equivalent amount of time and keep latency and quality jointly optimised. The tweaks include:

  • Minimise “waste” in the video processing infrastructure by optimizing TCP settings and streaming in UDP between the encoder and packager when they are separate.
  • Locate VOD storage immediately next to the Origin, using local storage rather than network-attached storage.
  • If using network-attached storage, use lower storage replication factors.
  • Ensure the CDN Edge serves stream requests as close as possible to the consumer.
  • Optimise for Cache Hit Efficiency, to reduce round-trip request times between CDN layers or between CDN and Origin.
How ABR Complicates Latency Management

ABR (Adaptive Bitrate) was invented to optimise the streaming viewing experience when there are unpredictable and inconsistent network conditions (which can be almost all the time on the internet). The primary goal is to maintain the stream to the consumer and stop the dreaded video freeze from occurring. It has been a very important development in the evolution of video streaming.

Like low latency, ABR depends on the buffer size, but for a different reason. Latency is heavily affected by the size of the segments and the number of segments held in the buffer. ABR is heavily affected by how much time is available to measure network performance and switch between bitrate profiles. ABR needs time to identify a network issue through network measurements and choose to switch to a lower bitrate that can be delivered in time to reach the player and provide stream continuity. When the buffer is longer there is simply more time available to take action.

As an example, a segment of 2 seconds will take 2 seconds to download across the network to the player from the upstream CDN. If there are 3 segments in the buffer, this means that there are 6 seconds queued up. If the 4th segment has an issue being downloaded in time when the 1st segment ends, the player can request a new 4thsegment. To arrive in time, the request must be made and the download must begin within the 2nd 2-second segment, so that it can be downloaded when the 3rd segment plays and be ready to play when the 3rd segment ends.

But if there are network issues, like capacity problems, round-trip delays, and network drops, then there is also a higher chance that other issues can occur, such as a restart or retry. There needs to be enough margin for error to manage the segment download. This is where the 3-segment standard has come from. But low latency reduces this margin for error, so sensitivity to network conditions increases, and even ABR’s ability to help is reduced.

How Smart TVs Can Complicate Latency Management

Ironically, the type of content where low latency is important, like live events, is also the type of content that has high potential to create network overload because of large audiences viewing concurrently. And a new issue is emerging with low latency delivery to Smart TVs, which again is ironically the environment where big live events are being increasingly viewed for best quality on the big screen.

The issue is that HTML5 browsers, which also applies to many Smart TVs, do not measure bandwidth availability for ABR calculations in the same way as players have traditionally measured. In low latency mode, it is critical to receive data into the decoder’s buffer as quickly as possible. In HTML5 this uses the Fetch API combined with HTTP’s transfer-encoding: chunked command which allows for data to be transmitted as it is being made available. But the Fetch API cannot measure idle time when chunks of data are being sent, so it will calculate a shorter transfer duration than reality, and therefore make it difficult for a player to estimate accurately if it should increase the video quality.

Working with low latency increases the importance of measuring upstream network performance to make fast decisions. So, if network measurement becomes less accurate or takes longer to get a measurement, then potentially we may need to change how ABR works.

To Overengineer Or Not To Overengineer?

If you want to overengineer your stream management approach to avoid these problems, you don’t really have a good choice.

The first point to make is that it is not possible to know when networks will go down or become non-performant for the video that is trying to traverse it, so when should you choose to over-engineer? ABR is the solution we have today to maintain stream delivery rather than freeze the video. But other than this, we have few choices to fix performance problems.

We could overengineer our delivery by sending multiple streams simultaneously to each player to give a real-time failover option to the player. But this brute-force approach leads to problems with bandwidth utilisation and the cost of delivery for broadcasters, potentially doubling costs, or worse. The goal is not to increase CDN costs, so we must be clever.

Leading distribution and player business, THEO Technologies, have a lot of internal discussions about this. They conclude that the first principle is to avoid needing to redownload a segment. As THEO CTO, Pieter-Jan Speelmans says, “We believe that aborting a download is the start of a chain-reaction of problems for the viewer experience. It can throw off critical timing deadlines in software algorithms that are running many simultaneous calculations to make stream request decisions. We do have customers working with multiple streams, or preloading channels based on customer behaviours within their EPG, but these are custom projects for very specific environmental conditions or specific types of content. The real solution for low latency is to engineer more cleverly to make best use of the video segments and chunks we have access to.”

Part two of this article looks at how to reach the 5-second latency benchmark while maintaining quality and starts to lay the foundations for surpassing “broadcast-grade” to create a new “streaming-grade” standard of the future.

You might also like...

The Business Cost Of Poor Streaming Quality - Part 2

Part 1 focused on what poor streaming quality means and what it can cost a D2C Streamer across multiple financial dimensions. This article focuses on preventing and fixing problems.

The Business Cost Of Poor Streaming Quality - Part 1

It is safe to say that probably every streaming service has delivered poor quality experiences to its viewers at some point.

The Importance Of CDN Selection To Achieve Broadcast-Grade Streaming Experiences - Part 2

CDN Selection is a pivotal point in the streaming delivery chain that makes an important difference to the success of a D2C Streamer.

Designing Media Supply Chains: Part 3 - Content Packaging, Dynamic Ad Insertion And Personalization

The venerable field of audio/visual (AV) packaging is undergoing a renaissance in the streaming age, driven by convergence between broadcast and broadband, demand for greater flexibility, and delivery in multiple versions over wider geographical areas requiring different languages and…

The Importance Of CDN Selection To Achieve Broadcast-Grade Streaming Experiences - Part 1

Multi-CDN is a standard model for today’s D2C Streamers, which automatically requires a CDN selection solution. As streaming aims to be broadcast-grade and cost-effective, how are CDN Selection solutions evolving to support these objectives?