The Technology Of The Internet: Part 3 - Tackling Video Streaming Latency

Latency has risen up the agenda as video streaming has progressed and measures taken to tackle it have been offset by proliferation of content, imposing strain on CDNs and delivery infrastructures. The key to successful control of latency lies in sifting its different components, identifying specific issues, and addressing those within the overall end to end delivery chain.

Other articles in this series:

Controlling latency is at least as great a challenge now as it was in the early days of video streaming up to two decades ago, as improved technologies are offset by increasing traffic and rising expectations. It is now accepted that most video transmissions, including traditional linear broadcast content, will in future be accessed predominantly via the internet over either fixed or mobile networks. This requires consistent latency comparable to cable, satellite, digital terrestrial, or managed IPTV.

That is close to being achieved for on demand content transmitted by leading Subscription VoD providers, but a long way off for popular live services. The extent of the problem for live can be seen by considering latency data for leading streaming services transmitting the annual American National Football League (NFL) Super Bowl, which is one of the world’s most popular single sporting events.

Data collected at recent Super Bowls by Chicago based video analytics firm Phenix reveals that average streaming latency at the event has deteriorated in recent years and was worse than ever in 2023. Phenix compiled lag behind real time for each of six popular platforms, that is NFL+, fuboTV, YouTube TV, Hulu, DirecTV and Fox, reporting an average of 56.9 seconds in 2023, up from 54.3 seconds in 2022. At least this suggests there is some handle on the problem, because in 2019 the average latency at Super Bowl was a lot lower at 37.7 seconds. The level is therefore stabilizing, and it looks likely that it will start to come down in 2024.

Yet, another telling statistic is the variation in latency experienced by users of the same platform, which was as much as 70 seconds from best to worst in the case of DirecTV and substantial for all the streamers measured. This indicates there is work to be done ensuring consistent latency, which is desirable given that on many occasions people near each other in say a sports bar are watching simultaneously on different devices. The experience is spoilt if some users witness significant events on the field before others.

It is true Phenix compiled just a snapshot and also that Super Bowl is hardly a typical event. But the data highlighted how latency remains a challenge for major events, whether in sport, music festivals, or even major breaking news when larger numbers of people stream live simultaneously. The key factors in those cases are capacity and traffic management at the network level, rather than choice of streaming protocols or codecs for individual users.

The causes of such latency are structural and can only be addressed through coordinated actions between event organizers and streaming companies. That is perhaps why the latency problems have been so persistent and even increased, at least temporarily, in the face of streaming proliferation. It also highlights how round trip latency is made up of many components along the end to end delivery chain and determined largely by those that make the biggest contribution. These are the latency bottlenecks and vary with the demands of the particular service or content type, as well as with distance between content source and end user.

While less critical in an absolute sense, latency also matters for on demand content. There is not the issue of synchronization to contend with since it does not matter that users are viewing at different times, but latency does contribute to the experience during start up and subsequent viewing.

The SVoD players discovered this early on, especially Netflix as the pioneer of large scale on demand streaming, finding that customer churn could be caused by undue delays starting up, even just a few seconds extra. Buffering during viewing in the event of temporary network bandwidth decline, which was more common in the earlier streaming days, also contributed to user dissatisfaction.

That was a subsidiary reason for Netflix building its own CDN called OpenConnect, followed to varying degrees by other major streamers. By distributing content in caches close to users at the edge of CDNs, or in Netflix’ case actually within the facilities of ISPs providing the local loop, startup latency was substantially reduced, while also trading storage capacity for long-haul bandwidth within the CDN.

This does not work for live content in the same way because there is no time to populate the caches in advance of final delivery by the ISP. Yet CDNs can still make a contribution, because switching and routing components along the IP delivery path can also make a significant contribution to end to end latency. A well configured CDN with as few nodes as possible can therefore reduce live stream latency.

Given that latency of broadcast content is typically around 5 to 7 seconds, that is the immediate target for streaming services as they become mainstream platforms. For some interactive TV services, as well as gaming, eSports and conferencing or collaboration, it must be brought much lower still, down to around 2.5% that level, or 150 milliseconds. Even say 25 ms delay is disruptive for two way video, and just the same for audio on its own.

Such low levels can only be achieved with the help of highly efficient streaming protocols. For one way video aiming to get latency down below 5 seconds, various protocols have evolved, such as SRT (Secure Reliable Transport) and RIST (Reliable Internet Stream Transport). These two protocols and some others improve efficiency of error correction in IP transport, aiming to achieve a balance between image quality and low latency.

TCP is the original IP transport protocol of the internet which retransmits IP packets that were corrupted or lost at the first attempt. This ensures quality but can accumulate latency, especially under poor network conditions when several retransmission attempts have to be made for some packets.

TCP works well enough for applications such as email where some delay can be tolerated but corruption of the content cannot be. Indeed, TCP was adopted for early video streaming services but was partly responsible for the high latencies experienced.

SRT and RIST improved on this through adoption of a mechanism called Automatic Repeat ReQuest (ARQ), designed to work with the alternative internet UDP (User Datagram Protocol). Under UDP, packets are sent out into the internet without any way of acknowledging their receipt and therefore resending them if lost. This reduces latency but at the expense of robustness, which is unacceptable for video streaming over an unreliable unmanaged IP network as the public internet effectively is.

ARQ reintroduced acknowledge and retransmission, but only packets that have been recorded as missed are resent. And if it is clear that a resent packet would arrive too late to fit in with a given latency budget, it is not resent at all. This cuts down on latency considerably while maintaining video quality at playback. However it still always requires some retransmission to cater for errors or lost packets and so can only go so far in cutting latency, not enough for interactive video.

So for the most demanding cases such as web conferencing, other protocols have been developed, such as Low-Latency HLS, and low-latency CMAF for DASH, and most notably Web Real-Time Communications (WebRTC). Many CDNs support some of these protocols to minimize latency across their domains. WebRTC has also been adopted across all web browsers, as well as popular video calling services such as Zoom, Google Meet, and Microsoft Teams.

It is also being adopted by video streaming services because it strips latency associated with error correction to the bone. It also increasingly enables latency to be optimized dynamically in real time as traffic levels and network conditions change, which has the effect of also maximizing quality. Already, adaptive bitrate streaming protocols such as MPEG’s DASH and Apple’s HTTP Live Streaming (HLS), adapt quality to the prevailing bit rate possible, and WebRTC can enhance that by ensuring that latency is kept just within budget, but not reduced further unnecessarily at the expense of quality.

The point here is that WebRTC potentially avoids any retransmission delay by incorporating some form of FEC (Forward Error Correction). FEC works by including some redundant information with a stream sufficient to enable lost or corrupted data to be recovered at the receiving end without requiring any retransmission and the delay that incurs. This comes though at the penalty of increased bandwidth which can add to congestion and itself potentially cause some latency if bandwidth becomes constrained.

For this reason, more sophisticated versions of FEC have been developed, such as Flexible FEC, under which redundant data is only added to a stream when network conditions seem to make that necessary. In cases where it turns out not enough redundancy data has been added to enable a stream to be recovered sufficiently accurately at the receiving end, some retransmission occurs. Flexible FEC then combines standard FEC with some retransmission, making the tradeoff between quality and latency more dynamic.

In practice, a streaming service should constantly monitor packet loss as it changes over time using perhaps the RTP Control Protocol (RTCP) receiver reports. Then the service can determine how much quality penalty is worth paying to minimize latency. This is a matter of tuning and work in progress for many service providers, where there may be scope for automation with the help of machine learning.

Such intricate mechanisms though should be seen in the wider context of overall bandwidth management under varying traffic levels with often pronounced peaks and troughs. At a network level, there has to be some provision for rapid acquisition of additional bandwidth to cater for peaks, to avoid either loss of quality, escalating latency, or both. Sudden latency with risk of buffering would result if bandwidth had not been reserved, even if under normal traffic conditions it was available for other less time-critical applications like casual internet browsing.

The other important point concerns latency over mobile networks. IP multicast has an important role to play here, as it does also over fixed networks, by reducing congestion incurred by popular live streaming services. Such congestion contributed significantly to the high latencies at recent Super Bowls but can be particularly decisive over mobile networks.

The other important aspect over mobile concerns handover when users’ devices move between adjacent cells. This may be unnoticeable for less time critical applications but can impose glitches and buffering when viewing live streams, especially at high speeds on trains or cars travelling on major roads. Latency can also be incurred when signal quality deteriorates, as happens most commonly towards the edge of mobile cells further from the base station transmitter.

Multi-cell connectivity is emerging as a remedy in conjunction with multicast delivery, by insulating users against variable signal quality in individual cells. This has been described in a Cornell University paper, the essential idea being to enable users to access streams simultaneously from several cells rather than one. The use of multicast to strip streams down to single transmissions inside each cell makes this redundancy much less costly in terms of radio bandwidth.

Tests have shown that multi-connectivity improves the performance of wireless multicast services significantly, especially during handover and at the edge of cells where channel conditions are often poor. This has an indirect bearing on latency because it reduces the need for retransmission and buffering close to the edge of cells and during handover.

Latency is therefore a complex issue for streaming, far more so than it ever was for direct forms of content transmission, but the ingredients for addressing it at all levels are now on the table. This does require coordination across the infrastructure and contribution from a variety of distinct technologies.

Other related articles posted on The Broadcast Bridge.

The Technology Of The Internet: Part 4 - Scalability & Availability In Streaming

You might also like...

Seeing The Streaming Tsunami Coming

Streaming video is on the cusp of becoming a major problem for broadband networks. We are about to see a huge Tsunami wave of demand emerge as broadcasters finally make a big shift towards streaming-first.

Monitoring & Compliance In Broadcast: Monitoring The Media Supply Chain

Why monitoring the multi-format delivery ecosystem starts with a holistic approach to the entire media supply chain.

Fixing The Internet For Streaming

There seems little doubt that the consumer transition from OTA/DTT delivery towards streaming is on a steep growth curve, but what will the new ecosystem look like? Is internet infrastructure ready to handle the bandwidth demands of full-scale streaming?

Embracing Interactivity In Live Streaming

Broadcasters are experimenting with, and starting to deploy, interactive streaming features, often AI-enhanced, to increase viewer engagement, with added personalization and more accurate ad targeting.

Monitoring & Compliance In Broadcast: Part 2 - The Converged Delivery Ecosystem

‘Monitoring & Compliance In Broadcast’ explores how exemplary content production and delivery standards are maintained and legal obligations are met. The series includes four Themed Content Collections, each of which tackles a different area of the media supply chain. Part 2 con…