Live Streaming Video Quality Measurement Rides High At IBC 2019

Both live streaming and video quality measurement were dominant themes at IBC 2018 and in 2019 these two themes converged with various announcements and demonstrations.

Live monitoring of perceptual video quality poses additional challenges in having to derive the measurements in real time, although admittedly not as computationally hard as live video content analytics, which also featured at IBC 2019.

Essentially live quality monitoring uses the same metrics as in on-demand video, although the tools to derive these metrics may be different or at least substantially modified. Of the metrics, not surprisingly VMAF (Video Multimethod Assessment Fusion) developed by Netflix is optimized more for on demand content, but still provides the basis for live quality monitoring.

In all cases, MOS (Mean Opinion Score) based on human assessment on a scale 1-5 provided the starting point for development of computational surrogates with varying properties, pros and cons. Naturally MOS provides the best prediction of how humans would perceive the quality of a given video sequence but is totally impractical because it does not scale at all. The computational models that automate these calculations are therefore judged firstly by how close they approximate to MOS.

The first metric to be widely employed in earlier video quality measurement systems was PSNR (Peak Signal to Noise Ratio), relating the maximum power of a signal with the background interference restricting ability to reproduce it at the receiving end. While PSNR is an important factor in quality reproduction it is a crude measure and fairly poor predictor of human perception, because it gives little direct indication of the quality impact.

This left the field looking for better alternatives and one of the first to make a significant impact was SSIM (Structural Similarity Index Method), whose origins actually date back almost as far as PSNR. Its ancestor was the Universal Quality Index (UQI), or Wang–Bovik Index developed 2001, which evolved into the current version of SSIM published by the IEEE standards body in April 2004. The underlying idea it that neighboring pixels both in space (within a frame) and time (between adjacent frames) are related and provide a framework for assessing changes in structure that make the greatest impact on the human eye. The SSIM Index is then calculated by considering various windows of each frame as a whole rather than just isolated pixels, using a mathematical formula engineered to yield fractional scores in the range 0 -1 representing the degree of degradation from the original source. Although promising, SSIM was very much work in progress and only yielded predictions slightly better than PSNR.

This left others working on improvements and Netflix was probably the heaviest investor because of the importance of improving streaming quality to compete with traditional pay TV. Working with the University of Southern California, Netflix developed its VMAF, which used somewhat similar basic metrics, including Visual Information Fidelity (VIF), which determines information fidelity loss at four different spatial scales. Then there is the Detail Loss Metric (DLM), measuring loss of details as well as impairments that distract the viewer. It also uses Mean Co-Located Pixel Difference (MCPD), measuring temporal difference between frames in luminance.

But a crucial improvement over SSIM is that VMAF incorporated machine learning so that it could be tuned to different content types and improve over time. It could therefore reach much higher levels of predictive value, getting closer to the ideal MOS scores, especially on content specific datasets, like animated videos for a cartoon-based channel. It could also work well for sports videos given its temporal component, but does require time which Netflix has plenty of being an SVoD service provider. So VMAF would work better for sports highlights than the live event itself.

Meanwhile SSIMWave, the vendor that took on SSIM, developed a much-improved version called SSIMPlus, also with machine learning now, while adopting a more expressive 0-100 scale, matching the scores linearly with human subjective tests during testing. One innovation putting SSIMPlus ahead of VMAF is adaptation to the viewing device with the ability to compare video quality as objectively as possible across different resolutions and formats. This could determine that a given video might look excellent with a much higher rating on say a smart phone while being much poorer on a large 4K resolution (2160x3840) TV. VMAF just has a standard rating for quality, along with smart phone and 4K, while SSIMPlus supports numerous device types.

Over predictive value there have been numerous claims and counter claims, but it looks like there is little to choose between the two. However, SSIMPlus has won some significant endorsements, one from the world’s biggest CDN (Content Delivery Network) provider Akamai, which used it as the basis for its work towards developing an industry standard for measuring for perceptual video streaming quality, which it considers an urgent requirement.

This led to publication of a paper called “What Does Good Look Like”, investigating issues around video quality. The paper was useful in crystallizing four factors making the biggest impact on perceived quality . Firstly is the content genre, since both spatial and temporal demands vary with the nature of the footage, its detail, color range and speed of motion.

Agama’s Director of Product Management Johan Görsjö.

Agama’s Director of Product Management Johan Görsjö.

Secondly is the player or device, which affects the quality displayed to the viewer and also the tolerance for artefacts and degradations. Thirdly is the network, especially critical for OTT because it is usually unmanaged and subject to varying conditions that can affect the bit rate available and impose jitter. Fourthly and related to the third is the bit rate ladder, whereby content is delivered at different bit rate profiles to account for varying network conditions, with a requirement for changes in profile players to be smooth and not too frequent. A key point here that is sometimes forgotten is that perceptual quality is often better at a lower bit rate profile because there is then less chance of artefacts or buffering occurring. Yet as Akamai pointed out, selection of profile to date is often based on “gut instinct” which it rightly says is not good enough and should be replaced by “evidence based” assessment.

This sets the stage for live quality assessment, which requires revisions of those systems, such that SSIMWave for example has developed its SSIMPlus Live Monitor. Live quality assessment imposes new challenges, including the need for computational efficiency for real time operation and to pinpoint quickly where causes of degradation are so that remedial actions can be taken. As a result, SSIMPlus Live Monitor uses lightweight software probes distributed to five principal points in the ecosystem: Source; Encoding and Demux Output; Aggregator Output; Delivery Across the Demarcation Point; and Playout by End-User Devices. The company claims it is the only vendor supporting real-time independent monitoring of encoders, allowing operators to secure the weakest link in the current ecosystem.

Another vendor adapting existing products here is Telestream with its Inspector Live, which has paid attention to bit rate shifting within key ladders as an issue for live content. It points out that the bit rate versions must be properly “aligned” with each other so that the shifts between them are as smooth as possible. It also argues that since video fragments can be introduced at the edge of segments, monitoring is required at the network edge as well as in the head end/origin to counter. So Inspector Live incorporates Encoder Boundary Point monitoring as well as IDR (Instantaneous Decoding Refresh) alignment verification to help ensure that bitrate switches occur smoothly during playback. An IDR frame is a special type of the i-frame familiar in the world of legacy H.264 video compression. An IDR frame specifies that no frame after it can reference any frame before it, which facilitates segmentation for OTT distribution because one segment can be independent of another.

It is worth pointing out here that there has been some consolidation in the video monitoring field and Telestream has been a major player here, having first acquired Ineoquest, a specialist in that field, in 2017, and then Tektronix Video, also a significant force in video test, monitoring and quality assurance, in April 2019.

As a result, Telestream took IBC 2019 as an opportunity for some rationalization, with the launch of its OptiQ Monitor, aimed at operators that already have the infrastructure required to support live streaming channels but lack monitoring capability, especially outside the CDNs they use. OptiQ itself is a framework of live services merging the expanded Telestream skill set around live streaming, workflow, cloud, integrated monitoring and software containers.

Also active on the live quality monitoring front at IBC 2019 was Sweden’s Agama Technologies, with a focus on reaching right back to the raw uncompressed feeds through processing and delivery, through to consumption in the set top box, smart TV or mobile video app. The aim is to embrace linear and OTT in a single package, according to Johan Görsjö, the company’s Director of Product Management. This is all part of the software-based Agama Analyzer, which now supports monitoring and assurance of uncompressed SDI (Serial Digital Interface) content. The company claims this makes it possible to assure the quality of video streams earlier in the head-end production workflow, creating visibility into the pre-encoding stage.

You might also like...

Data Recording: Reed-Solomon Error Correcting Codes - Part 21

The explosion in digital technology that led to Compact Discs, DVD, personal computers, digital cameras, the Internet and digital television broadcasting relies heavily on a small number of enabling technologies, one of which is the use of Reed-Solomon error correcting…

Digital Audio - Part 1

It seems almost superfluous today to specify that audio is digital because most audio capture, production and distribution today is done numerically. This was not always the case and at one time audio was primarily done without the help of…

Creative Analysis - Part 2 - Penny Dreadful From The Cinematographer

For anyone who’s seen the first series to bear the title, the name Penny Dreadful will conjure up images of occult happenings in a shadowy, late-Victorian world. After twenty-seven episodes across three highly successful seasons, Showtime aired the last e…

Audio Levels - Part 5

There is level and then there is loudness. Neither can be measured absolutely, but by adopting standardized approaches it is possible to have measurements that are useful.

Data Recording: Burst Errors - Part 20

The first burst error correcting code was the Fire Code, which was once widely used on hard disk drives. Here we look at how it works and how it was used.