OTT (Or Is It ABR?) - Part 3 - DASH-CMAF With HTTP/1.1 Chunked Transfer For Low Latency

ABR segments are transferred conventionally using HTTP/1.0 transfers, where the client requests the whole segment and it is transferred using store-and-forward transfers, where all the data belonging to the transfer is buffered before sending. In order to start the transfer, the size of the segment needs to be known.

This article was first published as part of Essential Guide: OTT (or is it ABR?)

With HTTP/1.1, however, it is possible to instead use chunked transfer. With chunked transfer, a single HTTP request is still made, but rather than one transfer, the data is divided into a number of chunks, each chunk can be transferred as soon as it is available (assuming the server providing the data is chunk-aware). As a result, only a chunk’s worth of buffering, rather than a segment’s worth, takes place. Since the aim is to be reasonably consistent in terms of end-to-end latency, the approach is to divide a segment into the same number of chunks, regardless of its bit rate. In Figure 6, the dashed lines show where chunks are delimited (although, as before, this diagram has been drawn to assist understanding, rather than a realistic number of frames per segment).

Figure 6: CMAF chunked encoding and HTTP/1.1 chunked transfer.

The underlying data being transferred must be formatted to make this chunked transfer possible. When DASH is used with CMAF (ISO-BMFF) format, the overall segment is divided into a number of fragments, each with its own moof+mdat box pair (each of which is transferred in sequence with its own HTTP/1.1 chunked transfer. Unlike HTTP/1.0, HTTP/1.1 uses persistent connections that allow multiple requests to use the same connection. In principle, exactly the same chunked transfer mechanism could equally be applied to a transport-stream based format (such as HLS or with TS payloads).

The system does still work in the absence of support for chunked transfers, but the latency will revert to the conventional latency.

Notice that while the chunks are shorter, the segments are unchanged. This is because segment duration is related to the necessary part of the latency: required for the client adaptation mechanism.

Figure 7: Comparison of HTTP/1.0 transfer and HTTP/1.1 chunked transfer encoding. This method can eliminate much of the unnecessary latency in the system, as long as the full chain is HTTP/1.1 chunked transfer encoding capable.

Effect of HTTP/1.1 Chunked Transfer On Data Traffic Patterns

With chunked transfer, since each transfer occurs as soon as the chunk is available (assuming the client has made the request for the whole segment), there is a burst of traffic to transfer the segment, followed by an idle period.

This means that if there is sufficient bandwidth, the overall transfer will always take close to a segment’s duration to complete, meaning that simple rate measurement as the segment arrives will either indicate the same download bit rate regardless of how much extra rate is available, or it will be volatile if it’s measured as each chunk transfer takes place. The smaller the chunks, the harder it is to make meaningful measurements. This is compounded when there is competing traffic, particularly if it too has a periodic traffic: idle pattern (as the contention will vary significantly). Chunked transfer encoding makes adapting bitrates upwards more difficult (although it should be noted that adapting upwards does not need to happen as urgently as adapting downwards).

Chunk Sizes

For some ultra-low latency demonstrations, a configuration has been used that places each encoded picture into its own fragment.

Since content varies in spatial and temporal complexity, and I, P and B type pictures have different sizes (and vary differently according to spatial and temporal complexity), it is clear that mapping individual pictures to chunks will result in the chunks varying significantly in size. For content with high spatial complexity, but little motion, the intra-coded pictures will be huge, and the temporally predicted pictures will be tiny. This poses a further complication for the rate measurement approach, so it is likely that a more stable operation can be achieved if the chunks contain a number of pictures – for example an 8-frame sub-GOP. This will result in better consistency and more stable operation, at the expense of a small additional latency.

What Latency Can Be Achieved?

With a fully chunked transfer capable system, latency should be feasible with around 2x to 2.5x segment latency, possibly lower. Demos with ultra-low latency have used artificially benign environments and more research is needed into chunked transfer with finite capacity, contending traffic and adaptation.

Lab Demos vs Real World Conditions

A number of public demonstrations have taken place to show DASH CMAF with chunked transfers operating with extremely low latency. While interesting, they typically utilize an environment that is not representative of the real target application space. Specifically, the conditions are usually:

Very high available bandwidth.
No contending traffic.
No additional CDN stages.
Single bitrate only to avoid the need to have an adaptation mechanism.
Negligible round-trip times.

ABR exists for the express reason of the need to adapt to network congestion. Real-world conditions mean there is contention and if that includes the likely situation of other ABR clients, the contention can cause antagonistic traffic patterns. Therefore, it is prudent to view these lab results with some caution – however it is genuinely true that the formats and chunked transfers can and will make a meaningful improvement to the end-to- end latency, compared to conventional ABR.

CMAF - CENC & CBCS

CMAF stands for Common Media Application Format, and while that may imply a format that can be used everywhere, the current reality is a little more complex, but improving.

The first issue arose because valuable content is often encrypted, without the encryption wrapper being common, then different data would need to be supplied. Enter Common Encryption (CENC). CENC provided a standard container for encryption using AES encryption, however different client devices used different modes: either Cipher Block Chaining (CBC) or Counter (CTR) modes, and it could be applied to data in different ways. As before, this resulted in different data, meaning different data for different devices.

Finally, agreement was reached in the market, that the format should be Cipher Block Chaining Sample (CBCS) mode, so the trend is towards a single dominant format: CMAF + CENC + CBCS. Whilst this isn’t yet widely adopted, it does signal a consistent future position that we can expect to gain adoption over the next few years, starting with UHD HEVC content.

QUIC & HTTP/2.0

Updates to the internet protocols has been under way for some time, HTTP/2.0 and QUIC are probably the most familiar. So how do these affect media delivery?

HTTP/1.1 forces multiple TCP transactions to load typical web pages, each one requiring its own TLS handshake and incurring its own round-trip-time latency or by opening many connections. HTTP/2.0 allows the use of multiplexed connections, reducing that need for parallel connections being set up. Thus, HTTP/2.0 can significantly help when loading complex web pages with multiple resources. For media manifest or segment transactions, though, this optimization does not really help.

HTTP/2.0 payloads are binary and always encrypted, including the headers. Even if the (media) payload has been secured, this means that transport level encryption must be applied on top of that.

HTTP/2.0 also has a server push capability, allowing a server to proactively push resources likely to be next requested. Again, this doesn’t help the media segment delivery, because each segment, even if it were chunked, is only a single request. The next segment is a separate HTTP request, so doesn’t benefit from the push.

QUIC (Quick UDP Internet Connections), on the other hand, is effectively a replacement for both TCP and TLS functionality. The use of UDP is conscious choice to move away from the congestion control mechanism that is part of TCP. TCP’s congestion control mechanism is responsible for slow startup of links, and for causing low throughput on connections with long round-trip times. One issue with TCP is that it is hard to update, since every switch and router implement the protocol, including the congestion control. Upgrading TCP is a practical problem. QUIC uses the application layer to create reliability (an inherent part of TCP), which in turn requires window management in order to avoid QUIC starving TCP transfers.

A side effect of using UDP is that the traffic is prioritized over TCP by switches and the buffers are not flushed as they are for TCP and it avoids the head-of-line blocking in switches that occurs with TCP connections.

There are pros and cons to this. It means that QUIC traffic is prioritized, but as a consequence moves the congestion control from the switches to the application layer of the endpoints. This may mean that general purpose internet traffic using QUIC may cause congestion that is problematic to resolve, but when competing with TCP traffic, QUIC is likely to obtain a greater share of bandwidth.

HTTP/2.0 and QUIC may provide some benefits but appear unlikely to inherently change the experience of OTT-style delivery.

Apple HLS Low Latency Extensions

Apple recently announced Low Latency Extensions for HLS. The principle of dividing segments into smaller chunks (“parts” in Apple terminology”) is used, the mechanics are fairly different from DASH CMAF-LLC. Firstly, the segments are divided into smaller duration files, which are transferred separately (rather than the chunked transfer encoding of one file in CMAF). Each part has an additional “.N” before its extension, in its file name. In addition, HLS Low Latency Extensions use HTTP/2.0 and long-poll requests. This is where the next “part” of the file that will become available is signalled in the manifest, and the client then requests that part of the file. If it’s not yet available at the server, then the server must hold on to the request and send the part file when it becomes available.

There are therefore two distinct low latency mechanisms that use different protocols and have significant differences. CDNs, origins and operator networks will therefore need to support both mechanisms.

The achievable latency is virtually the same as with DASH CMAF, so neither has a latency performance advantage.

Other related articles posted on The Broadcast Bridge.

Essential Guide: OTT (or is it ABR?)

Part of a series supported by

Broadcast Bridge Survey

You might also like...

Monitoring & Compliance In Broadcast: Accessibility & The Impact Of AI

The proliferation of delivery devices and formats increases the challenges presented by accessibility compliance, but it is an area of rapid AI powered innovation.

Requirements For A Video CDN Blueprint

We continue our series discussing the current lack of sufficient streaming infrastructure capacity to meet demand if the current rate of consumer transition to streaming services continues. Here we have an assessment of the key industry wide objectives that future…

Ad & Content Targeting With First Party Data And Video SMS

The continuing rise in streaming combined with a swing away from third party to first party data is driving broadcasters to seek new ways of engaging and reaching viewers for both content and ad targeting. Some video service providers are…

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

Preventing The Streaming Tsunami

Today, most broadcasters deliver less than 10% of their total viewing hours via OTT streaming services. As that shifts to streaming first delivery the Tsunami will be big… so what can be done about it?