The recent launch of Apple’s TV Plus service bulked up with original TV shows costing $6 billion to produce has disrupted global attempts to unify streaming behind a common set of protocols for encoding, packaging, storing and playing back video delivered over the internet.
Apple had earlier come in from the cold and agreed to support CMAF (Common Media Application Format) as the container format for its HLS (HTTP Live Streaming) protocol. This was a huge step forward because HLS is used not just for streaming to many of the 1.4 billion Apple devices in the world, but also a large number of other brands. This is because HLS evolved for streaming before the open alternative Dynamic Adaptive Streaming of HTTP (DASH) developed under MPEG in work starting 2010.
Inevitably, a few acronyms have to be unpacked to analyze the significance of recent developments around CMAF. The starting point was the consolidation down from four to two alternative streaming platforms from about 2014, DASH and HLS. Both are very similar in principle, designed to optimize video delivery over unpredictable connectionless networks by breaking content into sequences of small HTTP-based file segments, each one comprising short intervals of playback time lasting a few seconds even though the whole may be a movie or sports broadcast lasting an hour or two. The crucial point is that the content is encoded in parallel streams of these segments at different bit rates catering for varying network conditions and associated bandwidth availability.
These segments in turn can be sub-divided into chunks of known length, which is the basis for MPEG work associated with DASH on low latency streaming, as we will come onto. The other key term here is container, which is sometimes confused with the codec and also slightly misleadingly referred to as a format.
The codec is simply the method or product used for compressing video data. The container is then the extension or wrapper around the compressed or encoded content, describing how it is packaged and should be handled, with potential support for multiple audio and video streams, subtitles, chapter information, and synchronization information necessary to play back the various streams simultaneously. It specifically does not include information needed to decode the content, which has to be performed downstream in the client typically.
The main difference between HLS and DASH was over their containers, with the former using the .ts format to describe the contents and the latter .mp4 containers which had been proposed as an industry standard. This dichotomy meant that content distributors had to encode and store the same audio and video twice to reach the full constellation of devices, including Android, Apple iOS and Microsoft, in order to accommodate both container formats. Apple was under pressure to compromise but history suggested that would not happen, so the streaming industry was pleasantly surprised when agreement was reached in February 2016. In that month, Apple and Microsoft proposed CMAF to MPEG as a joint standard under the MP4 format supporting a fragmented version of the .mp4 containers that could now be referenced by HLS as well as DASH, so that in principle only one version of each content needed to be prepared for a given bitrate. Specifications followed over a year later in July 2017 and then the CMAF standard was officially published in January 2018.
CMAF’s goals were political and commercial as well as technical or financial. They were to cut costs by reducing to a single version, reduce workflow complexity accordingly, cut latency and achieve industry unification. Arguably the greatest achievement was finally to end the divide between what had originally been four alternative streaming protocols, Microsoft’s Smooth and Adobe’s HDS on top of DASH and HLS. But while HDS had been withering on the vine, Microsoft had already aligned Smooth with DASH.
However, celebration proved premature because there was still work to be done on the low latency aspect, which is critical for live content as anybody watching a major sporting event at public locations like airports will know well. All too often somebody viewing a stream via their laptop will celebrate an event such as a goal before it appears say on a big public TV screen.
Reducing streaming latency has proved elusive, with various components. There is the end to end transit delay resulting from time taken for the signal to traverse the network ordained by the laws of physics. Coupled with that is switching delay where there is some scope for improvement by increasing chip density for example so that distance travelled by signals within is reduced.
There is also delay resulting from error correction mechanisms, whether this is by inserting redundancy so that a given video sequence takes longer to transmit as in Forward Error Correction (FEC), or by retransmitting packets. The focus of streaming protocols such as SRT (Secure Reliable Transport) and RIST (Reliable Internet Stream Transport) is efficient packet retransmission by various pre-emptive mechanisms.
But adaptive bit rate streaming itself imposes latency because segments have to be constructed before they are transmitted. This adds the duration of each segment to the latency budget, which is why there is a tradeoff between segment latency resulting from the length and transmission efficiency, although this depends on whether or not connections during sessions are persistent or not. When connections are persistent, it is more efficient to send shorter segments because they arrive in sequence and do not need re-ordering, while with intermittent connections segments can arrive out of order since they are subject to varying delays.
MPEG with CMAF had developed a mechanism called chunked transfer encoding, as specified in the HTTP/1.1 standard, to minimize the impact on latency of segmentation within adaptive bit rate streaming. The problem to be solved was the delay having to wait for a whole segment to be created before it could be sent. Chunked encoding breaks content down further into fixed length chunks that can be streamed as they are created without having to wait for a header specifying the length. Instead, the end of a content sequence is signalled simply by sending an empty or null chunk. Without chunked encoding, the sender has to buffer the content until it is all ready to send.
Although Apple had not confirmed it would follow up its agreement to back CMAF by supporting chunked encoding as well, there was a widespread assumption it would do so. After all, the company already appeared to have agreed it was futile and a waste of resources to pursue a parallel track along a route where the scenery was almost identical.
But there were commercial factors to consider. In the past Apple has gone it alone over diverse aspects of its ecosystem, from charges for smartphones, tablets and computers to handling cloud backups, as well as the streaming protocols. The objectives were to continue competing over the UI, maintain reputation for innovation and to lock consumers into their devices as well as ecosystem. In the case of streaming, the competitive motivations are switching from lock-in and the UI to differentiating the service, and that is where Apple Plus comes in. At the time of announcing its surprise decision not to back the work on chunked transfer encoding, Apple Plus was in the final stages of development with all that content investment and the company saw an opportunity to establish a point of differentiation over low latency, even if that meant duplicating effort. Whether this actually succeeds is another matter and yet to be clearly determined, but at any rate Apple did at least have some other work to go on, the HTTP/2 PUSH standard. This provides the bones of Apple’s alternative to chunked video encoding, called Apple Low-latency HLS (ALHLS), which has been developed with the help of some technology vendors also with feet in the DASH camp.
On top of HTTP/2 PUSH, as opposed to the HTTP/1.1 chunked video encoding of DASH, Apple has developed a number of complementary techniques it claims cut delay to the bone. In fact Apple developed an enhancement called Partial Segments to exploit the benefits of HTTP/2 PUSH, by enabling creation of smaller chunks that still conform to the underlying CMAF container format. These chunks can be as small as 250 ms, therefore comprising just a few frames, called HLS partial segments, the idea being to allow greater flexibility for adjusting to the prevailing bandwidth and also smaller latency in loading, but at the expense of creating very long playlists specifying the chunks’ locations.
Then HTTP/2 PUSH kicks in by bypassing the traditional process for accessing streamed content that involved first polling the playlist file to check for new available segments and then retrieving the media segment via a second HTTP request. When low latency delivery is required, the overhead of these traditional HTTP requests becomes a significant additional source of delay that Apple wanted to address. Apple ALHLS uses HTTP/2 PUSH to push out the shorter media “parts” in response to a playlist request, bypassing those HTTP requests.
Send Once Delta Update
The downside is that the playlist has to be fetched very frequently, as these pushes can occur up to 4 times a second at least. This was already a problem for long running events and Apple’s creation of even shorter segments or chunks only exacerbated that. Apple’s remedy is a feature called playlist delta update, which simply involves sending the whole playlist just once to a client device and then transmitting only updates comprising the last few segments, along with low latency parts, as required.
There are some other features such as faster bitrate switching, where playlist responses can contain information about the most recent chunks and segments available in another rendition or bit rate. This makes it possible to switch straight to a new content representation without first having to issue a full playlist request to start the switch.
Early evidence is that these additional measures can shave a bit more off the latency, but with the important caveat that real savings will only be achieved if they are supported across the ecosystem, including the CDNs (Content Delivery Networks). Currently only one or two of the major CDNs support the underlying HTTP/2 PUSH at all and Apple has been widely criticized for introducing more complexity and uncertainty into the ecosystem for what might be only a minor gain for those services where low latency is critical.
A point to note that HTTP/2 PUSH is an extension of the broader HTTP/2 protocol designed as a major upgrade to the original HTTP for the World Wide Web. Apple pulled the wool over some eyes by citing the widespread support for HTTP/2 by CDN vendors without pointing out that very few had added PUSH. HTTP/2 PUSH operates by allowing a node in a CDN, essentially a server, to push an object to the client unsolicited, to circumvent the delay involved in the requesting process. The main problem is the lack of support from CDN vendors, which if sustained would undermine the multi-CDN strategy that big content streamers more or less have to adopt for resiliency, cost reduction and performance optimization.
CDN Support Is Critical
So the fate of ALHLS is in the hands of the streaming industry and especially CDN vendors, because without widespread support it is dead in the water. After all, low latency CMAF already enjoys broad industry support, from both CDN and player vendors.
There is though, some industry support for Apple’s position with CDN vendors coming under pressure to implement push as a low latency enhancement. Apple is one of the few industry players with the clout to browbeat the player and CDN industry into supporting HTTP/2 PUSH.
For now, it is almost back to square one with a new streaming war. But not quite, because CMAF has been accepted as the universal underlying contained format. This means that even if low latency DASH and low latency HLS are both deployed, both will use CMAF for media segments, while still supporting legacy players, albeit at higher latency. Both can use the same DRM, although do not have to, and both will achieve latencies in the broadcast range 4 to 10 seconds.
As for coexistence, operators are able to hold just one mezzanine file for both formats to be distributed at the origin, because of the CMAF support, but will have to store two different copies of the same asset at the network edge to cater for the low latency approaches.
All in all it seems a pity Apple could not work within the MPEG community towards a common approach, but then that would be too good to be true on past precedent. Apple’s decision has pleased one group though, third party streaming vendors that can now offer to take care of the increasing complexity on behalf of customers. One such vendor is Wowza, which is working to support both and believes, or certainly hopes, that both Low-Latency HLS and low-latency CMAF for DASH, will quickly become widely deployed for OTT, live sports, e-gaming and interactive streaming.
You might also like...
Top TV engineering technologists update the current status of ATSC 3.0. The cloud is the unifier.
OTT delivery continues to expand to meet the relentless growing consumer demand. This trend shows no chance of abating and technologists are continually looking to innovation to scale infrastructures accordingly. In this sponsors perspective, Ryan Nicometo, SVP of Product for…
The first set of quarterly financial results during the lockdown has given the clearest indication yet of what impact the crisis is having on sectors and individual companies in the video services and broadcasting sector, with a stark divide between…
The media industry is evolving faster than at any point in its history. Broadcasters and content producers are striving to meet consumers’ insatiable appetite for more content, rich viewing experiences, stunning images and access across all screens. As a result, i…
OTT delivery continues to expand to meet the relentless growing consumer demand. This trend shows no chance of abating and technologists are continually looking to innovation to scale infrastructures accordingly. But what does it mean to scale OTT? Where is…