Content Operations In The Multi-Screen Age: Adapting To Fit: The Transcoding Challenge

Effective transcoding has become increasingly critical in the streaming era, with mobile ratcheting up the complexity and range of target platforms even further. The range of options is greater than ever, with cloud, edge, and on premise-based offerings, and multiple combinations of all three.

Like other foundational elements of AV (Audio Video) workflow, transcoding has become even more critical in the streaming era. Both delivery to mobile devices and cloudification is already adding futher complexity, while the role of AI with its scope for automating and enhancing the efficiency of the process at various points adds even more.

Transcoding is related to, and often confused with, other aspects of video processing over its lifecycle, especially transrating, resizing, and transmuxing. All fall under the same overall heading of video transformation across the workflow, but are distinct applications addressed by separate software components, even if they are virtualized over common hardware infrastructure.

Transrating is just altering video bitrate, which has become more important in the streaming era with the demand to create multiple versions to cater for varying network conditions and different user device capabilities. Resizing, sometimes called trans-sizing, is a change of resolution, related to transrating because it can be required to cater for different target devices. It can also involve changing the aspect ratio for optimal display on given screens.

Transmuxing operates at the transport or container level and does not touch the video content at all, and is sometimes likened to repotting a plant. It can be equally critical for serving multiple target devices, sometimes complementing transcoding. It is focused on playback, catering for the various video containers supported by different devices, including MP4 (part of MPEG-4), Adobe’s Flash Video (FLV), and Google’s WebM.

Transmuxing ensures the video file is in the right container format for either the target device or browser. This avoids compatibility issues at the container level interfering with playback. Efficient transmuxing is particularly important for live streaming where it may be necessary to adapt to different target containers on the fly. It optimizes transport for cost efficiency, while also minimizing the contribution of video packaging to latency.

Then there is video encoding itself, which overlaps with, complements, and is sometimes confused with, transcoding – often all at the same time. Workflow often begins with video capture, which is followed by encoding into a master digital form ready for the distribution chain.

Minimizing Impact

Transcoding comes next, to create multiple versions of the content for different distribution platforms and target viewing devices. This can change not just the codec through which the digital video is compressed and decompressed, but also resolution, frame rate and bit rate.

As that indicates, transcoding overlaps with and can include transrating. The key point is that while the underlying content remains the same, the format is changed to the extent that the viewing experience can be significantly altered. The objective is to ensure that the impact on viewing experience is as small as possible, by relating level of compression to the nature of the viewing device. For example, on a small screen reducing resolution might not degrade the quality to any noticeable degree.

Indeed, by generating versions of the same video content in multiple formats and quality levels, transcoding is designed to ensure all users obtain the best possible viewing experience given their device and internet connection. In the case of video streaming, the role of transcoding is much more integral to the experience, because it is inseparable from the underlying adaptive bit rate streaming (ABRS) that has become a de facto standard for almost all video distribution over the internet. This is irrespective of whether the content is being delivered live or on demand.

A Two Step Process

Transcoding is usually executed in two stages, with the first step being to decode the master data as originally encoded into an intermediate uncompressed format. The latter could be Pulse Code Modulation (PCM) for the audio, well suited for representing analog sound signals digitally. For video, YUV is one option for representing the color space digitally in uncompressed formats; another is RGB, which breaks the video down into primary red, green, and blue channels.

The second step is then to re-encode the video into a target format. In this guise, transcoding works alongside initial encoding. However, it can also be used to re-encode the original video without changing the essential format or the viewing quality at all.

This is where transcoding does overlap with encoding. Such re-encoding can be done for several reasons, with an obvious one being to exploit a new more efficient codec that has just become available or affordable.

Re-encoding is also required for light editing of the video, even if the substance is not changed, for example to remove undesirable visual artefacts in archive footage. In this case the compressed data will normally be decoded out of the compressed format and then re-encoded after the editing has been done.

Compression Categories

This comes onto the different combinations of lossy and lossless compression involved in the transcoding process, where the advent of machine learning has brought a new dimension. Until recently transcoding has been designed at best to maintain current quality levels, at a given rate of compression. In many cases compression level is increased to cater for bandwidth or target playback constraints. The outcome also depends on losses that might be incurred as a result of the compression process.

Transcoding has traditionally been categorized three ways on this basis: lossless-to-lossless, lossless-to-lossy, and lossy-to-lossy. Lossless compression, which can be enabled by something like the JPEG 2000 codec widely used at the contribution stage, reduces file size without losing any critical data required for high quality image reproduction. It works by exploiting statistical redundancy in the data, which is considerable in the case of uncompressed video, paring down as close as possible to the theoretical minimum required for faithful reproduction of the original.

By contrast, lossy compression discards not just redundant data, but also some that does contribute to quality in the AV case. As the name suggests, there is an irreversible loss, and furthermore, this is cumulative. This leads to the phenomenon of generational loss where the quality deteriorates progressively with successive compression and decompression cycles. The upshot is that lossy compression should ideally only be exercised once on the master source in the case of transcoding.

The ideal scenario is lossless-to-lossless transcoding, which can be performed repeatedly without affecting video quality. In practice this is not always possible, such as when targeting less capable devices, in which case lossless to lossy compression might be required despite inevitably sacrificing some of the quality.

Conversely, lossy-to-lossy transcoding operates on media that has already been compressed with some quality loss. The transcoding process will then at best sustain the quality and more often result in some further deterioration.

Breaking All The Rules

Until recently that was it, but the advent of machine learning based AI has ushered in a fourth category – lossy-to-lossless transcoding. That appears to contravene the basic law of compression that quality can never be improved, but it is made possible through the ability of AI/ML to reconstruct video and increase the resolution, frame rate, dynamic range, color gamut, or other aspects.

This application of AI/ML is relatively new and subject to ongoing research, with roots in restoration of archived content, as well as manipulation of still images in applications like Adobe’s Photoshop.

AI/ML is improving the efficiency and quality of transcoding generally by streamlining its application and avoiding the need to settle on specific file sizes and resolutions in advance. There is growing ability to analyze every piece of content individually and determine unique optimized transcoder settings according to the nature of the audio and video. Furthermore, it is becoming possible to adjust these settings on the fly as video is captured in the case of live services.

AI/ML can also improve the compression level of a given codec without sacrificing quality, for example by identifying key objects such as faces, as well as areas such as sky and water where there is greater redundancy. Data representing those latter areas can be compressed further under the heading of Content Aware Encoding (CAE), a major area of R&D in its own right. This can be executed through re-encoding, reducing the bit rate or data generated for a given level of quality.

Transcoding To Scale

Another development impinging significantly on transcoding is cloud and edge computing, adding scalability and elasticity to the workflow, at least in principle. This again plays into the time varying demands on transcoding where AI/ML is figuring, with growing scope for scheduling bandwidth and storage resources on demand.

Public or private cloud CDNs come in by not just allocating those resources dynamically but also changing the locations within the network where they are employed for a given stream. Video transcoding is starting to be deployed at the edge of CDN clouds to cache content more locally to reduce latency as far as possible within the laws of physics. This can involve re-allocating caches around CDNs to the various services running on it according to algorithms interacting with the transcoding process.

The upshot is that transcoding is increasingly entwined with other aspects of the workflow, with AI/ML both adding to complexity and offering relief through automation of these processes.


All 6 articles in this series are now available in our free eBook ‘Content Operations In The Multi-Screen Age’ – download it HERE.


You might also like...

Production–Delivery Convergence: Part 6 - Designing Experiences That Viewers Trust

Performance reliability is an invisible contract between a streaming service and its customer, and it is fundamental to guaranteeing viewer retention. The problem is that performance isn’t just about delivery. Here we identify where to look and why it’s c…

Standards: Video - Advanced Video Coding (AVC)

AVC remains one of the most widely deployed video codecs in the world, but navigating its profiles, levels and signaling mechanisms is far from straightforward.

Live Sports & Monetization: Public Service Broadcasters Maximizing Live Sports Opportunities

PSBs across the world are making the most of limited resources to enrich live sports coverage around ancillary content and platforms, and monetizing the resulting services. Here we focus on the content and coverage rather than technical issues around workflow…

Production–Delivery Convergence: Part 5 - Scaling The Future

The streaming industry is delivering richer formats, more personalization, and more immersive viewing experiences. There’s just one problem – how can the global delivery ecosystem support it?

Standards: Video - Standards For Video Coding

From 4K to 32K, the demand for ever-larger video formats is pushing codec technology to its limits. This guide surveys the landscape of video coding standards – from legacy MPEG formats to AI-driven neural network compression – to help navigate the choices sha…