As parts 1 and 2 of this article highlighted, there are very practical steps available today for Streamers to reach broadcast-grade latency. Low latency protocols from HLS and DASH that implement CMAF Low Latency Chunking are the main proponents of 5-8 seconds of latency from encoder input to player output. But the pull system of streaming creates an opportunity to reach a new 1-second standard, perhaps that we can call “streaming-grade”.
Other articles in this series:
The Third Low Latency Zone – The 1-second Range
HESP ( High Efficiency Streaming Protocol) was founded by THEO Technologies and Synamedia, and an alliance of other streaming technology providers have added their weight to the initiative, including MainStreaming, G-Core Labs, Noisypeak, Native Waves, and EZ DRM.
HESP created a new way to look at decision-making within ABR constructs to smoothly handle ultra-low latency. The traditional thinking has always been that quality must come before low latency, which is natural given the overriding interests of consumers and content providers for good quality viewing experiences. So far, we have therefore lived with the fact that we need to switch off alerts from social media and other apps (and try not to listen to our neighbours) in order not to spoil our live event viewing experience. HESP aims to solve this problem by achieving broadcast-grade latency with the safety valve of ABR to avoid rebuffering and poor viewing experiences.
A core design principle in HESP is that the overall delivery chain must be set-up so latency can be pushed to its absolute limits. In other words, we should be able to stream at the fastest possible speed allowed by the inherent latency of the whole delivery chain, without worrying about this causing poor video quality. This means that HESP focuses on three things – 1) using long contribution segments to the player for maximum stream input stability, 2) using standardized low-latency chunk sizes (e.g., 200ms), and 3) using ultra-fast but efficient stream switching abilities.
The use of long segment sizes creates the opportunity for encoding efficiency with high quality. As a rule, shorter segment sizes create a heavier workload on the encoder and create trade-offs for quality. This first technical design point allows HESP to achieve best quality output from the encoder, by allowing time to calculate the optimal motion vectors and use optimal references. A 2-second segment is sufficient for a broadcast-grade encoder output.
The second design point to use standardised low latency chunk sizes around 200ms is used to comply with best practices as researched and advocated for by the SVA and other leading vendors. The overall efficiency of the delivery chain does typically not warrant using smaller chunk sizes, even if 1-frame chunks of 16ms could be theoretically achieved for video at 60 frames per second (or 33ms for 30 FPS video).
The third design point of having ultra-fast switching capabilities is how HESP continues to use ABR within a very low latency environment (note that this also brings fast channel-change capabilities, but this is a separate story). The key to this is to first introduce what HESP call Initialization Frames, a unique feature in this protocol. The Initialisation Frames are generated at the encoder and are output as a second stream. There are choices about how many initialisation frames are necessary – for example, super high-performance streaming can have initialisation frames for every frame, but it is also possible to use a sparse set-up where the initialization frames are inserted every 10-20 frames. These frames are the switching point that the player can use.
The second stream concept is inevitably going to create additional egress from the origin, but it doesn’t add any extra overhead to the CDN, because the Player will only request chunks from one of the two available streams, rather than from both. For high-performance streaming needs, perhaps only applied during live events that use high resolution formats, this level of investment could easily make very good economic sense for the excellent viewer experience it should deliver.
In the end, the buffer size that HESP can work with is the length of time it takes a player to receive the second stream from the CDN, which can easily be in the 300-500ms range if the content is already in the Edge Cache. This allows the player to receive the right content very quickly which solves one of the most important issues to reduce latency to its absolute lowest levels in an ABR streaming environment.
Figure 1: The HESP construct of the Initialization stream that supports the main “Continuation stream”.
Because of these capabilities, in an ultra-high-performance single-frame chunk size set-up with HESP, it is possible to consider the following latency table. The length of the single frame is assumed to be 33ms from a 30 frames per second video (rounded up).
As Pieter-Jan Speelmans at THEO concludes: “HESP can quite easily achieve sub-second latency from encoder input to player output given that we have implemented a real-time back-up plan with initialization frame architecture. If the delivery chain conditions are excellent, we have observed as low as 400ms end to end latency. Of course, if we want to reduce risk, we can increase latency to 3-4 seconds to achieve a completely buffering-free experience and still deliver better than broadcast-grade latency. We have spent over 5 years working on this approach and believe it can truly bring D2C Streamers the key low latency performance that will move streaming performance to the next level.”
The engineering tools to achieve near-broadcast grade latency of 5-8 seconds are now embedded in the HLS and DASH specifications. Implementations are still in the early stages, but clearly D2C Streamers with the need for low latency performance now have the ability to reengineer for lower latency.
The restrictions of the CMAF LLC format have encouraged the introduction of HESP, to find ways to overcome the inherent trade-off between quality and latency and beat what we currently call “broadcast-grade streaming”.
In streaming we need to consider the quality of the network and the available capacity to pass bits from encoder to player. ABR is a very important tool for us to protect the viewer experience, but low latency really stretches ABR to its limits. The set-up of the encoder, packager, and player are fundamental to achieve low latency, and an excellent CDN is essential to be able to quickly pass through the lower chunk sizes and sustain the overall streaming performance, particularly when large live audiences are involved and the whole subject of “latency sensitivity” has been raised to its highest alert level.
The engineering innovation of HESP creates incredible possibilities for “streaming-grade” to be considered the gold standard of content delivery. Remarkably, it could be even better than broadcast-grade. No pun intended, but things move fast in the world of streaming.
You might also like...
Training neural networks is one of the most import aspects of ML, but what exactly do we mean by this?
The more digital TV technology advances, the more the fundamental elements of TV remain the same.
One cannot get very far with electricity without the topic of batteries arising. Broadcasters in particular have become heavily dependent on batteries to power portable equipment such as cameras and lights.
The Sponsors Perspective: Proactively Monitor IP Video Networks & Essences With Inspect 2110 & PRISM
For over two decades Telestream has streamlined the ingest, production, and distribution of digital video and audio. Today, compared to its SDI/AES-based predecessors, IP video adds exciting new challenges to these workflows.
IP connectivity delivers flexibility and scalability but making the theory work often requires integrated solutions that are adaptable, open, and promote interconnectivity.