Video Quality: Part 3 - Ensuring Video Quality For New Experiences

We continue our mini-series about Video Quality, with a discussion of how increasing diversity and fragmentation of content origination has opened a new front for Quality Control (QC) and Quality Assurance (QA). Automation, with the help of machine learning, is becoming essential and also extending to content correction through frame interpolation and other techniques.

Other articles in this series:

Internet distribution over unmanaged networks has created new challenges for broadcasters and video service providers that have been addressed by various techniques building on foundational adaptive bit rate streaming. Latency has been brought under better control through mechanisms such as Secure Reliable Transport that reduce delays associated with packet retransmissions in the event of error or drop.

But then additional challenges have arisen, associated with the content itself, rather than its distribution, with the diversification of sources and increasing reliance on third parties. The latter includes a growing amount of professional file based content, especially entertainment and documentaries, imposing increasing strains on existing QC (Quality Control) and QA (Quality Assurance) pipelines.

There is a growing amount of user generated content (UGC) of varying quality entering broadcast channels, requiring adaptation of existing procedures. There is also content of various forms either spun out of existing programming, such as clips and highlights, or generated specially for redistribution over social media or other channels.

All this content requires application of QC to meet a broadcaster’s standards, even when there is little or no scope for ongoing QA as is increasingly applied in direct production. The standards applied may not be universal, given that for some breaking news events the only immediate footage may come from the mobile phone of a passerby. In such cases quality is less of a concern here on the basis that any content is better than none.

But for the bulk of programming viewer expectations will not waver much. This is a particular challenge for some aggregators that may produce little original content of their own but rely on culling from public repositories such as YouTube, publishing material in return for attribution through use of embed codes.

This growing volume and range of content inevitably is driving automation of QC and QA procedures, with the help of neural network-based machine learning. There has been substantial research now by broadcasters and academia into automated Video Quality Assessment (VQA) algorithms to evaluate emerging and archived content on the basis of so called natural scene statistics (NSS), designed to emulate human experience more accurately than past methods based on MOS (Mean Opinion Scores).

The idea is to capture a broader range of human viewpoints in the real world and not just in video, seeking to imitate the way people respond to color and movement in the field. The eventual hope is that this will also feed into Generative AI models that create content, or upgrade archived material to contemporary standards, in terms of resolution, frame rate and dynamic range.

Indeed, there is growing synergy between QC and video generation. There has been a trend for some time towards application of QA across the whole content lifecycle from origination through filming to postproduction. This reduces the amount of wasted material, as well as the cost of producing content that meets quality guidelines, by ensuring that checkpoints are met during the workflow, rather than making more extensive changes at the end of the process.

Now with growing automation around AI there is scope for incorporating QA with even less human intervention. There are various proprietary commercial tools capable of generating plausible videos from a range of textual and image input, but more recently some open source options have become available for broadcasters to try.

Two such models were described in an October 2023 paper from Cornell University in the USA with an emphasis on QC, including a text to video system designed to generate videos at 1024 x 576 resolution, better than other open source options so far. Such models are already capable of adhering to structure, style and quality of supplied reference images, and this is a field where rapid advances are anticipated.

Related to this is another fast-growing area, frame interpolation, which can be applied both in generation of original content and enhancement of existing assets, including archives. Broadcasters are interested in the application at both ends of the scale, for bringing old films up to contemporary quality, and also enhancing more recent material that has been shot perhaps at lower frame rates that impair the viewing experience on big screens especially.

Frame interpolation generates new frames from the existing input frames, so can potentially enhance past footage of fast-moving sports action for example. The generated frames are inserted back into the original video to produce a sequence that can be played at a higher frame rate such as 100 fps, or even higher. It can also be used for production of high-quality slowmotion output. As an example, if a video was originally shot at 60 fps, that would become 240 fps after interpolating by a factor of 4. If then replayed at the original 60 fps, it would be slowed down four times, while having fine resolution.

There is also interest in applying interpolation to replace damaged frames in archived footage. But one issue has been that interpolation methods applied so far have lacked sufficient precision to produce smooth content and so motion blur is sometimes introduced. This arises when the computed frame is too different from what the real frame would have been had the camera supported that higher frame rate.

This is one area where contemporary AI methods are making a big difference, by being able to generalize better than traditional rule based approaches. This has run into another problem, that of the computational complexity incurred by these AI models when analyzing lots of high resolution video.

The BBC has been at the vanguard of research on this front, and in January 2024 published details of an approach that reduced the number of parameters involved in the AI models of frame interpolation. At first sight this would seem to jettison the required level of detail for generating interpolate frames of sufficient quality, but the BBC found that by using multiple encoders, each one could focus on a specific feature within the frame and then still elicit the same high quality.

The BBC has also been engaged in studying the tradeoffs broadcasters and content producers have to make increasingly as their output serves different genres or use cases. In gaming, especially multiplayer, ultra-low latency is more important than resolution, frame rate, or other aspects of video image quality. The same applies to videos of sporting events shown on betting sites, where latency equals money for the punters.

For fast moving sports, resolution and low latency are important, but high frame rate is of paramount importance to ensure smooth rendering on big screens. For some content, gardening programs for example, resolution and dynamic range reign supreme. For nature films involving moving animals, a balance is required, with reasonable frame rate and image quality being important, latency rather less so.

There is a growing amount of content featuring some form of Extended Reality, adding new wrinkles to the quality challenge. This is especially the case with Augmented Reality (AR), where artificially generated video is superimposed on footage captured by camera. Here the rendering process itself is sensitive to latency in so far that if the superimposed artificial images are slightly out of sync with the real footage of an event, the experience will be poor. The rendered artificial objects might be lagging, or floating aimlessly around the field of vision.

AR is often employed in computer games, where, as noted viewing latency can be critical. This raises another point, the location of image rendering. This can be done on the user’s computer for minimal latency, but with the tradeoff that image quality may be compromised. If computation occurs far from the home in a data center, great image quality can be obtained, but latency may be too high because there is a minimum delay proportional to the data round trip distance, ordained ultimately by the speed of light.

Edge compute close to the user can be the ideal compromise, but this may only be economically viable amid denser populations. Economies of scale will inevitably need to be found and are being explored by some major broadcasters, as well as multiplayer gaming service and technology providers.

Quality concerns are also having an impact on the evolution of broadcast distribution, sustaining momentum behind 8K production, even if that might seem overkill for the direct viewing experience. 4K coupled with HDR and where relevant high frame rate and ultra-low latency, should together enable good enough immersive experiences.

This reckons without various special effects and generation of ancillary content, however. When content is shot in 8K, there is greater scope for subsequent manipulation and cropping while still yielding high quality footage that might then be played out as 4K. Shooting in 8K now also ensures a degree of future proofing, with provision for emerging technologies that might benefit from the higher pixel density.

Already, a lot of Hollywood content incorporates visual effects that would benefit from a higher resolution. 8K also enables some special effects directly, such as zooming in on sections of frames, which reduces the resolution when the crop is expanded back to the full screen. Again if the original is shot in 8K, the cropped image might still then be rendered in 4K or at any rate 2K HD.

Another benefit of 8K is in super sampling for 4K or even 2K services, eliciting finer graining and smoother edges.

8K can also enable virtual lensing, that is the simulation of different camera angles from the original footage. This can then be applied in post-production to produce alternative views even when there are no actual cameras in position to capture them in the field. This has obvious appeal for say second tier sporting events where the cost of having multiple cameras could not be justified.

There is increasing use of special effects in audio as well, and that also has quality implications. That makes it harder during QA to distinguish between audio artefacts such as background noise that should be removed during production, and special effects that want to be retained. This again is an application well suited to machine learning, through development of algorithms capable of separating the wheat from the chaff.

Finally it is worth noting that there are relevant initiatives from representative industry bodies, but these tend to be confined to conventionally produced content. A notable one has come from the European Broadcasting Union (EBU) with its Strategic Program for Quality Control, currently defining QC criteria as well as guidance on implementation.

These are being adopted by relevant national bodies across the EBU areas, such as the UK’s DPP (Digital Production Partnership), which has released a set of standardized UK QC requirements to help producers carry out essential quality checks over broadcast files. These include checks for loudness levels, and freeze frames, as well as audio sync, buzzing, and unclear sound.

Other related articles posted on The Broadcast Bridge.

Video Quality: Part 4 - Video Quality Focus On Generative AI

You might also like...

HDR Picture Fundamentals: Brightness

This article describes one of the fundamental principles of broadcast - how humans perceive light, how this relates to the technology we use to capture and display images, and how this relates to HDR & Wide Color Gamut

Virtualization - Part 2

In part one, we saw how virtualization is nothing new and that we rely on it to understand and interact with the world. In this second part, we will see how new developments like the cloud and Video Over IP…

The Big Guide To OTT - The Book

The Big Guide To OTT ‘The Book’ provides deep insights into the technology that is enabling a new media industry. The Book is a huge collection of technical reference content. It contains 31 articles (216 pages… 64,000 words!) that exhaustively explore the technology and…

Pioneering 5G Broadcast In The USA

As momentum for 5G Broadcast around the world slowly grows, we catch up with progress in the USA with recent and forthcoming trials.

Virtualization - Part 1

As progress marches us resolutely onwards to a future broadcast infrastructure that will almost certainly include of a lot more software running on cloud-based infrastructure, this seems like a good moment to consider the nature of Virtualization.