The Technology Of The Internet: Part 4 - Scalability & Availability In Streaming

The rise and rise of video streaming has increased pressure on both fixed and mobile networks, driving developments and innovations in technologies for scalability. These include multicast delivery for live streaming, and more sophisticated load balancing for all forms of online content.

Scaling to large numbers of users and traffic levels has proved challenging for many streaming services as they have grown in popularity. Failure to get on top of the problem has resulted in the performance of some streaming services flatlining or even deteriorating over time. The Broadcast Bridge has already noted how average streaming latency at the annual American National Football League (NFL) Super Bowl, which is one of the world’s most popular single sporting events, has deteriorated in recent years and was worse than ever in 2023.

This is ultimately a failure of scaling, as networks buckle under increasing traffic and deliver worse quality of service, often with reduced resolution as well as latency. Failure to scale properly also affects availability, which can be defined as the percentage of total time a service can be accessed. The figure can then be aggregated across a user base to determine an overall network availability score.

It has tended to be one off major events, especially sporting, that have shone the spotlight on scalability and often illuminated areas of concern. By the same token, it has been the major streaming providers, both live and on demand, that have been forced into addressing scaling problems first and developing innovative mechanisms. This has been seen with the likes of Netflix and Disney+ on the VoD side, and by Facebook Live and YouTube among others for large scale live streaming.

The fundamental problem lies in reliance on the internet as a large and unmanaged shared public network, susceptible to congestion and unpredictable performance. The solution to some extent lies in bolting on dedicated connectivity at various stages of transmission to install some predictability and consistency, in a sense restoring the integrity of traditional linear broadcast infrastructures, that is digital terrestrial, satellite and dedicated coaxial cable, or managed IP networks.

It also involves converting unicast distribution to effective multicast or broadcast, so that single streams serve as many viewers as possible, moving away from redundant one-to-one delivery. The latter becomes very costly in bandwidth for popular live services, and also runs into scaling problems if all available trunk or core bandwidth is used up.

CDNs evolved to address this issue for on demand services initially, by distributing more popular content to caches closer to the users, cutting out the trunk between ingest and point of CDN egress. Some larger SVoD providers, such as Netflix, then terminated their own CDNs inside the premises of the ISPs providing the final deliver link.

As Netflix put it, “hosting our edge servers at ISP premises allows us to deliver content directly to users without traversing through the broader internet, minimizing latency and reducing costs. It also ensures better scalability during peak traffic periods.”

Similar principles hold for live streaming, the one key difference being that end to end latency is that much more critical for which CDNs are then only a small part of the answer. But caching to multiple edge servers still reduces long haul bandwidth costs and avoids reliance on the public shared internet for any stage of delivery.

Facebook Live is a major example of a live streaming platform that has connected directly to ISPs. As the second most popular live streaming platform after YouTube, certainly outside China, Facebook Live recognized the need to bypass the internet as far as possible several years ago. By 2020 it was already keeping most of its live video traffic within the ISP domains of each viewer, inside the destination countries and away from congested international trunks.

Problems still arise and partly because of its scaling issues, Facebook Live does not score as highly for quality of experience as it does for sheer usage. But it is able to track increases in demand globally and scale accordingly, as it does when connections to individual ISPs become overloaded. There is still a sense of reacting to major events as they occur and then upgrading such links to higher bit rates, which perhaps explains significant deviations in user perceptions of Facebook Live around the world.

These major social media platforms used to differ from traditional broadcasters on two counts. First, they had to cope with greater scale earlier on, then second they had to reconcile the conflicting demands of high quality linear content and the mixed bag of uploaded User Generated Content (UGC).

But broadcasters are migrating linear content progressively to streaming platforms and also ingesting an increasing amount of UGC material from the field, for example in remote news gathering. Some of that content is semi-professional from reporters in remote places relying on mobile phones to capture content but does mean that broadcasters have been experiencing similar mixes of content to the big social media platforms, albeit usually on a smaller scale and confined more to their region.

Viewers’ expectations of professional content emanating from major live events such as the UEFA Champions League, still more often consumed on big screens, are as high as ever, while there is still more tolerance of UGC content, which unlike professional material usually has to come in over the public internet. The big social media platforms have already succeeded to a large extent blending these two by being able to capture UGC at differing lower quality levels and in some cases applying AI-related techniques to make the content more presentable.

An underlying point is that scalability, like latency and QoS generally, is only as good as the weakest link in the video delivery chain. Therefore scaling up your video experience successfully without any sacrifice in quality requires investment across the entire network, from ingest at least to the ISP, relying on the latter for the final hop. In practice, especially for smaller streamers, this involves relying on others such as CDN providers for at least part of the delivery path.

For those responsible for parts of the chain, advancing technology needs to be taken into account, creating new possibilities and changing some priorities. This is the case with load balancing, which is an alternative to CDN in which network traffic is automatically distributed across a cluster of servers, often all located in a single data center. A CDN accomplishes a similar task, although there is also the option for large content distributers of balancing loads across multiple CDNs, taking one step up.

Load balancers arbitrate between multiple servers in a cluster, either just having them take turns on a round robin basis, or with some more sophisticated measure taking account of changing demands and workloads. Load balancing used to be done almost exclusively at the data transport level, known as layer 4, which is relatively cheap computationally but at the sacrifice of overall efficiency across a network. Under the Layer 4 method the load balancer’s IP address is advertised to clients for a web site or service, so that is where requests for content end up.

Layer 4 load balancers therefore make routing decisions based purely on address information extracted from the first few packets in a TCP stream, without inspecting content itself. They were favored largely because hardware was not powerful enough, or was too expensive, to move higher up the stack for making more intelligent decisions at the application layer 7.

More recently increasing power of hardware and reducing cost per unit of execution has made it affordable to implement layer 7 load balancing, where routing decisions are based on analysis of the HTTP header, which effectively is in the application layer. This allows the contents of the message, such as the URL and data type to be inspected, so that routing can be determined accordingly, enabling prioritization for certain video.

Instead of managing data on a packet-by-packet basis as Layer 4 load balancers do, Layer 7 load balancers can manipulate traffic on the basis of a transaction between the client and the application server. This in turn is helping maintain performance under increasing load.

The other ingredient of rising importance is IP multicasting, especially over mobile networks. This is a complex and evolving field still undergoing standardization for 5G networks by the 3GPP, but the principle is well established, pruning content down to single streams or instances between origin and end user. Mobile networks comprise wired cores feeding cells with wireless base stations over backhaul links, then radiating out to users. Multicast can be implemented as an overlay comprising a smaller number of larger radio cell towers under the High Tower High Power (HTHP) model. This is almost a hybrid between legacy digital terrestrial and cellular, with a network of transmitters taller than typical cell towers serving users over a radius up to 60 Kms, compared with 7 Kms maximum for a standard 5G cell over relatively flat rural terrain, depending on operating frequency.

Under multicast delivery, only some users will normally be accessing a service but not all, with constant joinings and leavings. Therefore, interactivity is needed for devices to signal when they want to leave or join a service, which raises a subtle but important scaling point just being addressed by the 3GPP under the forthcoming Release 18 due for launch in 2024.

Until now there has been a tradeoff between 5G Broadcast, where all devices access a service, and 5G Multicast, where only some do (the third case being unicast where devices each have their own streams consuming a lot more bandwidth for popular content). Under 5G broadcast, reliability and efficiency are lower than multicast because the UE (User Equipment) does not send feedback to the 5G network. Therefore the service does not know when a user is having problems, and unnecessary bandwidth may be consumed by sending content to all users, whether or not they are interested.

The upside for broadcast is scalability because it avoids the need to maintain uplink connections for all multicast users. Under the current 3GPP Release 17, multicast reception requires the UE to be connected and so if the network scales up too far, it may hit uplink congestion, leaving the only option of switching to full broadcast mode and losing the efficiency benefits.

Under the forthcoming 3GPP Release 18, multicast reception is allowed over an inactive UE mode, when the device can receive information, such as video content guides, but cannot transmit, thereby saving battery. The device can then decide to activate itself at a time the user wants to start viewing multicast content, and can then join the session, by issuing a resume request. This improves scalability while maintaining the efficiency and other benefits of multicast.

This then is an impending 5G development that could improve scalability, although these are still early days for roll out of 5G multicast generally. It highlights how streaming scalability remains work in progress right across the delivery chain from ingress to the last mile.

You might also like...

Why AI Won’t Roll Out In Broadcasting As Quickly As You’d Think

We’ve all witnessed its phenomenal growth recently. The question is: how do we manage the process of adopting and adjusting to AI in the broadcasting industry? This article is more about our approach than specific examples of AI integration;…

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.

Video Quality: Part 1 - Video Quality Faces New Challenges In Generative AI Era

In this first in a new series about Video Quality, we look at how the continuing proliferation of User Generated Content has brought new challenges for video quality assurance, with AI in turn helping address some of them. But new…

Minimizing OTT Churn Rates Through Viewer Engagement

A D2C streaming service requires an understanding of satisfaction with the service – the quality of it, the ease of use, the style of use – which requires the right technology and a focused information-gathering approach.

Designing IP Broadcast Systems: Where Broadcast Meets IT

Broadcast and IT engineers have historically approached their professions from two different places, but as technology is more reliable, they are moving closer.