How CDNs, Adaptive Bit Rate streaming protocols and codecs have evolved together to take video streaming forward from the earlier inconsistent and low-quality experience, to deliver broadcast quality today in many cases.
Other articles in this series:
The main bones of video streaming, CDNs, codecs, and transport protocols, are now mature enough for primary services and at the point broadcasters can trust them for lean back viewing in many cases. This has left the digital divide as possibly the biggest challenge for streaming, leaving people unable to access good enough broadband left behind with poor quality internet, still unable to enjoy even consistent standard definition online video.
The other major challenge is around scalable live streaming, especially of popular global events, for which IP multicast has emerged as the most viable technology enabling high quality distribution to large audiences. Other activity centers around algorithms that may incorporate machine learning for optimizing the performance of codecs and also the balance between quality and risk of buffering during playback on end devices through adaptive bit rate streaming (ABRS).
Latency is also critical for live streaming, but we delve deeper into that in a future article in this series. Here we chart the progression of streaming video quality to identify remaining challenges and issues today, both for on demand and live. This has been driven along three axes comprising CDNs, codecs, and ABRS, which between them enable reliable streaming at ultra HD quality over the broadband networks widely available in more developed countries and increasingly in urban areas of developing ones.
For many rural users, even in developed countries, where quality has been poor, services based on Fixed Wireless Access over mobile networks, LEO (Low Earth Orbit Satellite) and other non-terrestrial networks based on stratospheric balloons or drones, are starting to fill in these coverage gaps and heal digital divides to some extent.
Many authorities cite ESPN SportsZone’s stream of a baseball game between the Seattle Mariners and the New York Yankees on 5 September 1995 as the first to be transmitted simultaneously in real time to a substantial audience. But this had to contend with low bit rates of 64 Kbps for those lucky enough to have a digital connection known as ISDN, or less for many with dial up analog modems. The low resolution and long latency with frequent buffering and freezing, as well as poor audio/video synchronization, meant that very little further video was streamed over the World Wide Web (WWW) until 2002 when VHS quality video with more reliable lip sync had become just possible.
Even then prevailing bit rates over emerging telco DSL (Digital subscriber Lines) of around 256 Kbps meant resolutions were poor. A few years later by the mid noughties, Macromedia, later acquired by Adobe, had emerged to dominate streaming with its Flash Player, which for the first time unified streaming media with interactive operation.
But the problems of bandwidth, scalability and latency remained largely unsolved, merely rolled back just a little. Yet continual progress was being made along all three of those dimensions of codec, CDN and transport.
The key codec development for streaming was emergence of the H.264 codec in 2003 as a sequel to MPEG-2 which had been adopted by broadcasters and pay TV providers from the mid-1990s for their linear services, over satellite, cable TV or terrestrial networks. Otherwise known as Advanced Video Coding (AVC) or MPEG-4 Part 10, H.264 about doubled the efficiency of MPEG-2 and was also more flexible with its support for several video bit rates and resolutions, as well as several different encoding techniques.
This added flexibility was conducive for streaming and over time AVC progressed to dominate that field. A key milestone was passed in 2007 with Apple’s adoption, followed in 2008 with Adobe for Flash, even though that subsequently was usurped by HTML based ABRS.
An even more significant milestone for streaming was passed in 2010 when Google started using H.264 for YouTube, even though it subsequently moved away codecs it had helped develop. This led to H.264 becoming the defacto standard codec for streaming video over the coming few years.
H.264 retains its dominance today even if it has been slipping recently. In digital video technology firm Bitmovin’s 2022 Video Developer report, over 91% of respondents reported still using H.264.
Meanwhile, back in the early noughties video streaming had coalesced around CDNs, at that stage mostly for on demand content, because this saved bandwidth and improved quality by serving customers from local caches in their area. This reduced the time to start playback and also improved quality because it avoided reliance on end to end connections from the origin server.
It enabled greater scalability as long distance video transport was given over to larger specialist providers with far greater capacity. Later, major streamers such as Netflix and Amazon built their own CDNs in recognition of how critical this component was for ensuring QoS, reducing bandwidth costs, and scaling to ever larger volumes of content and numbers of users.
CDNs did not by themselves address the unmanaged and unpredictable nature of the access network, over which available bit rate could often swing wildly as traffic and therefore congestion varied. Whereas emerging IPTV networks replicated the predictable dedicated paths of linear TV over IP networks, streaming still relied on the public internet for delivery, including the CDN portion in the case of live content.
ABRS protocols were developed to mitigate the impact of fluctuating bit rates by encoding given content at several bitrates, resolutions, and potentially framerates, often via different codecs. This allowed a provider to cater not just for the unpredictable nature of the internet itself but also of receiving devices with their various screen sizes, and display capabilities.
This gave rise to the idea of a bit rate ladder where content would be encoded at several protocols to suit different resolutions. This might be at 5 Mbps for 1080p resolution, 4 Mbps for 720p, 3.2 Mbps for 640p, 2.0 Mbps for 480p and 1 Mbps for 270p.
The bit ladder is constructed by breaking the video into small segments or chunks at the various resolutions in the process known as packaging. Each of these chunks, perhaps 6 seconds long, is then requested individually and delivered to the players for rendering, using a manifest or playlist via a streaming protocol such as DASH, HLS, or HDS.
This means the quality can be adjusted to the maximum possible, swapped every 6 seconds or so. The selection is made by the client device on the basis of available network bit rate, providing that is less than equal to its own playback capability. It is worth noting that swapping quality profiles too often may not optimize the viewing experience if viewers are subject to frequent changes in resolution.
The relationship between bit rate and quality in the bit ladder is not cast in stone but depends on the codec’s compression power. Swapping H.264 for a more advanced codec then enables a bit ladder to associate a higher resolution with a given bit rate. For example 1080p might then be enabled at 3 Mbps instead of 5 Mbps. That is a direction of travel as codecs advance.
Indeed, the streaming world has been moving towards higher performing codecs, partly through pressure from the leading technology firms and partly because H.264 does not work so well with HDR or even 4K. Google developed VP9 as an open source alternative to H.264 free from royalties, released in 2013, since when it has gained traction with adoption by the leading platforms, including not just You Tube, the Chrome browser and Android platforms but also Apple iOS, as well as Netflix.
The situation now is that over 90% of video transmitted via Chrome using the WebRTC protocol (which we will discuss in the follow up on latency) is encoded either with VP9 or its predecessor VP8. As a result, VP9 now ranks second behind H.264 on streaming platforms and is gaining ground.
Meanwhile, the SO/IEV Moving Picture Experts Group developed H.265 as its anointed successor to H.264. Also called High-Efficiency Video Coding (HEVC) this has gained some ground among broadcasters but has been hamstrung in the streaming world by complications over royalties. This lingering confusion is also helping drive AV1 as the streaming successor to VP9, which is being promoted by the Alliance for Open Media comprising Amazon, Netflix, Cisco, Microsoft, Google, and Mozilla.
AV1 is more efficient even than H.265 and works well with ultra HD HDR content. Its handicap is that it is computationally intensive but given further advances in hardware that will become less critical. It looks likely that AV1 will supplant H.264 as the dominant codec in the streaming world, and therefore for broadcasting generally as transmission of linear services and channels continues their migration to the internet.
The other critical area of progression is centered on IP multicast for scalable transmission of live video. This was discussed in more detail in our first article on the internet itself, the main point being that unicast distribution of live video over long distances across the internet, whether under CDN operation or not, is highly inefficient and becomes very expensive at large scale, with the risk of incurring additional latency under congestion. IP multicast prunes transmission back to single instances across each link, saving bandwidth and ensuring optimum performance, providing of course it is properly implemented. This is work in progress for many broadcasters and live streaming providers.
The other area of note is further optimization of the ABRS process itself, particularly at the player end. There is contention here between desire for the best possible resolution, which would mean selecting a resolution close to the maximum the current bit rate can support, and minimizing risk of buffering. The latter would be best served by allowing more headroom for a subsequent drop in bit rate, while also maintaining more consistent quality.
Various additional protocols have been developed to optimize this balancing process itself. One called BOLA is already being used in production by the BBC and CBS, as well as French telco Orange and CDN company Akamai. This continuously calculates the tradeoff between the probability of video freezing (rebuffering) and video quality. The main advance over preceding algorithms is that it no longer requires a prediction of available network bandwidth for some seconds ahead, which itself is not always accurate. Instead it maintains adequate headroom on the basis of current fluctuations in bandwidth according to the specified risk profile, which can be more or less risk averse.
Advances then are still being made along all three streaming axes, with some signs of convergence behind AV1 for codecs, even though H.264 is still number one at present. There is also growing consensus behind IP multicast.
You might also like...
A self-described “technologist” at heart, Louis Hernandez Jr. knows an emerging trend when he sees one and likes to ride the wave as long as possible. Trained by his father, a computer science teacher, with his formal undergraduate and MBA in …
The Edge network scales with the audience. The more people that stream concurrently, or the higher the average bitrate requested by a consistently sized audience, the more capacity the Edge network needs. Achieving best possible efficiency at the Edge requires…
We explore the basics of physical connectivity & signal management encountered in broadcast audio systems alongside the destination recording devices.
Quantum Computing is still a developmental technology but it has the potential to completely transform more or less everything we currently assume regarding what computers can and can’t do - when it hits the mainstream what will it do…
At the heart of virtually every IP infrastructure and its inherent IT network is a software layer that acts like a conductor to make sure the system is working smoothly. Some call it the orchestration layer because it instructs each…