Encoding Update - New Developments In Codecs, Cloudification And ML Powered Processing

Amid continuing interest in evolution to more advanced codecs such as VVC (Versatile Video Coding), the encoding field is dominated by migration to cloud transcoding for increased efficiency and flexibility, as well by growing use of ML based processing techniques. These trends are also affecting the external debate between hardware and software encoding.

Encoding and transcoding are being transformed by industry developments similarly to other aspects of the AV (Audio Video) workflow, notably cloudification, demand for greater energy efficiency, and increasing use of Machine Learning (ML). These trends are circumscribing the eternal debate over software versus hardware encoding, and also the progression towards newer more efficient video codecs.

The rapid rise of live video streaming to fixed and mobile devices over the internet is also playing into these developments by demanding encoding in real time, as opposed to VOD where quality, energy consumption and cost are relatively more important than fast execution.

Terminology needs to be clearly defined, encoding being the process of compressing video to ensure that it can ultimately be delivered at the required quality over the available bandwidth, applied at different stages of the pipeline. It enters for video contribution, often at a fixed resolution, and then for distribution over the internet the video is often re-encoded in the process known as transcoding. That involves first decoding uploaded content, increasingly in the cloud, to recover raw video, then encoding it again at different bit rates and resolutions to cater for varying network conditions and device playback capabilities.

Finally, analog encoded signals are decoded into digital form on the end device, or in the cloud, for playback. Codec is simply then a name for the system that performs the encoding, which can be software running on a generic CPU or GPU (Graphical Processing Unit), or a dedicated hardware device.

It is true that over the last few years progression towards later generation codecs has generally proceeded more slowly than had been anticipated, with the Covid-19 pandemic perhaps partly but certainly not wholly responsible. Broadcast Bridge reported how release of the AV1 codec from the Alliance for Open Media (AOM) in March 2018 was supposed to herald a battle with the H.265 (HEVC) codec for supremacy in the converging world of video entertainment.

The expectation then in 2018 was that HEVC would succeed MPEG-4 in the broadcast world, while AV1 would prevail for streaming, with both eventually converging on the same space. Such convergence would be driven by developments such as the DVB-I standard enabling delivery of linear content at full broadcast quality over the internet.

In the event, H.264 remains the dominant codec today, through inertia combined with the fact that a big majority of playback devices support it, which is still not true of the others. Encoding.com’s 2019 Global Media Formats report showed 82% device support for H.264 compared with only 12% for H.265 (HEVC) in second place. Since then, support for HEVC has risen, but that for H.264 has barely declined.

There are good reasons though to expect that the pace of migration towards at least these codecs will accelerate over the rest of 2023. One major factor is Google’s decision to support HEVC in its Chrome browser, in addition to AV1 which it has been promoting. This could encourage a strong swing towards HEVC this year, since it offers 50% higher encoding efficiency than its predecessor H.264 and is the favored codec by many broadcasters. At the same time, momentum is also building behind AV1 in the streaming world given its support from the major big tech companies such as Apple, Amazon and Microsoft.

However, while AV1 is still lacking on a number of mobile devices it is supported on a number of smart TVs, as well as by video service providers like Netflix, and so will gain traction in the living room. Impending support from Apple and wireless chip market leader Qualcomm is also adding impetus to AV1.

This in turn raises questions over the generation to follow, the codecs to succeed AV1 and HEVC. These are respectively AV2 and Versatile Video Coding (VVC), also known as H.266, ISO/IEC 23090-3, or MPEG-I Part 3. The latter is further ahead in development, and likely to witness more deployments over the next year.

Yet impetus towards these next generation codecs is less strong than for some of their predecessors at times when they were needed to enable critical impending developments in video delivery. H.264 for example was carried along on the first mobile video wave, since its predecessor MPEG-2 was unsuited for this task. Then HEVC momentum has been driven more recently by its ability to serve smart TVs with 4K SDR and HDR content. There is no such smoking gun use case at present for VVC, especially as enthusiasm for 8K TVs has been tempered by their heavy electricity consumption at a time of environmental concern and rampant energy price inflation.

Interest in codec evolution has switched more in recent years towards complementary technologies that enhance the underlying base method. This happened with Low Complexity Enhancement Video Coding (LCEVC), a ISO/IEC video coding standard developed by MPEG as part of its MPEG-5 Part 2 LCEVC. This is an enhancement layer which is combined with base video encoded with any of the popular codecs, such as AVC, HEVC or AV1 to produce a video stream with added detail and sharpness for a given level of compression. It works by correcting artefacts introduced by the base codec, and has proved an effective way of extending the compression capability, while reducing both encoding and decoding complexity. Launched at NAB 2022, MPEG-5 Part 2 LCEVC is now being positioned as a platform for future enhancements without necessarily progressing to a next generation base codec.

The idea of enhancing a base codec is now being taken further through ML, aiming to home in on features or objects such as faces within video frames to enhance the quality of those selectively. This emerging application of ML is known as Region of Interest (ROI) filtering, since it first entails identifying objects for special treatment. This exploits work already done in other sectors such as video surveillance to detect faces from the background. It applies Deep Learning, the data intensive version of ML in which the algorithms home in themselves on the desired result after being shown large numbers of examples, rather than being guided towards it by humans.

This requires more computational power and storage, but yields more robust applications. In this example, the ML engine identifies a face, extracts it from the frame with its coordinates, and passes that to the encoder with instructions to perform extra processing to optimize the quality. In effect, these objects are enhanced at the expense of the background. Tests in focus groups show that viewer perception of quality is higher when ROI is applied compared with not, even though much of each frame is of the same quality or even a bit less.

ML is also being applied to improve the efficiency of cloud-based transcoding. One of the world’s biggest social media companies has invested heavily in ML for cloud transcoding for live video within its platform, using purpose built ASICs (Application Specific Integrated Circuits) co-designed with a major maker of wireless silicon.

As this suggests, the use of resource-intensive deep learning is boosting hardware based encoding after a period when the pendulum was swinging towards software. For now, it makes sense to perform these intensive operations in dedicated hardware alongside the general video processing, especially for live broadcasting at Ultra HD with HDR. There will be growing use of such dedicated neural processing units for encoding and especially cloud-based transcoding over the next few years.

This will be driven by a boom in cloud transcoding itself, which apart from efficiency improvements will also help optimize quality at the point of consumption. It enables adaptive playback, where at any one time the user obtains the best possible quality possible at the prevailing bandwidth. The cloud can make versions transcoded at multiple bit rates available and switchable quickly on the fly so that devices are not locked into lower qualities at times when the network can deliver better.

The other aspect of encoding of growing concern and interest is energy efficiency, under the banner of GoS (Greening of Streaming). This is another acronym we will be hearing more of over the next few years, and in the context of encoding raises some contentious issues. Until now the sole focus of encoding research has been on bit rate reduction and maximization of quality. Now the desire for reduced electricity consumption introduces a third balancing point into the mix. This can be illustrated by reference to the next generation VVC codec, which really comes into its own for high quality video, reducing bit rate required for 8K video by 40% compared with HEVC.

Reduced bandwidth should save energy in the network, but VVC requires substantially more computational power to achieve this. Therefore, VVC will increase energy consumed at the encoding level, highlighting that GoS calculations will not always be clearcut. What is clear is that 8K increasingly looks like an expensive and unnecessary luxury given current concerns over energy consumption.

You might also like...

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

Future Technologies: New Hardware Paradigms

As we continue our series of articles considering technologies of the near future and how they might transform how we think about broadcast, we consider the potential processing paradigm shift offered by GPU based processing.

Standards: Part 10 - Embedding And Multiplexing Streams

Audio visual content is constructed with several different media types. Simplest of all would be a single video and audio stream synchronized together. Additional complexity is commonplace. This requires careful synchronization with accurate timing control.

Designing IP Broadcast Systems: Why Can’t We Just Plug And Play?

Plug and play would be an ideal solution for IP broadcast workflows, however, this concept is not as straightforward as it may first seem.

Future Technologies: Private 5G Vs Managed RF

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with whether building your own private 5G network could be an excellent replacement for managed RF.