The Business Cost Of Poor Streaming Quality - Part 2

Part 1 focused on what poor streaming quality means and what it can cost a D2C Streamer across multiple financial dimensions. This article focuses on preventing and fixing problems.

The impact of poor QoE is significant enough to make it a high priority for resolution. It is why the industry talks about making streaming “broadcast-grade”. Good and consistent picture quality and low latency are the basic requirements to deliver. So how do we fix problems quickly and prevent them from happening in the first place?

This article was prepared with the benefit of expert insight from the team at Bitmovin.

Errors At The Player

Much like a Playout Automation environment that is often the first place to start looking for Linear TV Playout problems, in Streaming the first place to look is the Player. Player vendors are generally involved in helping their D2C Streamer customers to troubleshoot errors and identify the solutions. But understanding viewer QoE issues and assuring excellent performance remains one of the biggest challenges because of the lack of holistic tools for developing and testing video streaming solutions.

Various error types are found at the Player. Typical examples are missing DRM initialization data, manifest misconfigurations, audio track misconfigurations, and ad playback problems.

During the pre-release development phase, it is common to see playback misconfiguration as an error. As Developers work with different content types, operating systems, and devices there is a long list of playback parameters to set correctly and optimize. One common challenge is that viewing devices behave differently. For instance, the content browsing functionality on a specific Smart TV versus a specific mobile device can operate very differently, and errors can easily occur if the App and Player are not set up properly. For this reason, Players can include dedicated modules for specific devices.

D2C Streamers report that new App releases are a source of high risk to QoE and lead to a number of errors. Unfortunately, App errors are “own goals” in a competitive market and should be avoided. A more sophisticated analysis and testing environment makes an important difference to reducing the number of errors in the days following a new release.

Documentation is continuously being improved by leading Player suppliers to support their customers’ App Development teams to understand configuration issues. Templates are provided of known good configurations by device type. Error codes are improved to be more descriptive and more precise. For example, researchers report that they have invested heavily in improving error code descriptions and updating documentation for Streaming App developers. Instead of just a “2009_DRM_KEY_ERROR” code, the code is described to now provide helpful guidance in plain language – “you can ask your DRM vendor to try verifying the init data is valid and not malformed. Most commonly the PSSH boxes need to be confirmed”.

Some errors have not always been flagged by analytics systems as QoE errors, such as audio lip-sync issues. But as viewer QoE is analyzed forensically to try to remove all types of technical error, the analytics systems are categorizing and reporting errors more precisely. From a cost of poor quality perspective, the ability to proactively understand the customer’s experience from the operational KPIs is critical. Customer experience professionals know that only a small percentage of dissatisfied customers will care enough to express their dissatisfaction to their service provider – most will just silently dis-engage. It is therefore business critical to know the customer’s experience without them needing to describe it.

To better understand potential QoE issues and fix them during the development phase requires holistic analytics solutions for developers. Headline performance must be understandable, coupled with drill-down into individual session-level detail. Customer journeys should be traceable, showing how viewers interacted with the App or website, discovered content, started streaming, stopped streaming, and exited the App. The Development team, who are responsible for perfecting this journey for all viewers, must be presented with this information so they can work efficiently and intelligently to eradicate technical errors and assure success of their OTT service.

However, based on D2C Streamer feedback, there is no substitute for real-world testing using the D2C Streamer’s own content and service environment. To find Player-side errors, the full range of consumer platforms must be tested, which needs a lab environment at least with every major platform type. Most major Streamers work with 12 different platforms.

A lab like this supports pre-launch testing but is also necessary to replicate customer reported issues for many non-specific errors. Leading Player vendors highlight that if a QoE customer issue is reported then it is important to replicate the viewer’s setup as accurately as possible to create a performance baseline. This includes testing using matching device models and shaping a similar network bandwidth. This ensures the troubleshooting process results in a standard of delivery which is achievable given the viewer's environment and reduces time to issue replication.

Some errors can be fully resolved at the Player, while others can be worked around while a more permanent solution is implemented at the error source. Playback compatibility with a particular browser or platform is managed at the Player. Errors that originate upstream of the Player can, at best, be managed with a workaround at the Player. Some environments are easier to work with than others – for example, the Player can do manifest manipulation for Android devices, but not for Apple devices. But Player-side fixes for upstream errors can often be non-standard solutions, which introduces risk to the Player’s stability and ongoing maintainability. Overall, video streaming Developers need to think about what they can change easily and quickly, especially if they are working to prevent future QoE issues as quickly as possible. In short, websites and players are easier to change for an App Developer than hardware encoders, ad servers, and CDNs.

Errors Upstream Of The Player

As reported by D2C Streamers interviewed for this article, approximately 75% of streaming problems must be fixed upstream of the player. Upstream means the “Origination & Delivery” environment. Within this, estimates are that 90% of the errors relate to “CDN/ISP networks” and 10% are from “source/encoding” environments. Anecdotally, most QoE problems relate to buffering.

Content delivery issues in CDNs and ISPs are very challenging to manage. The problems witnessed by D2C Streamers generally point to a lack of capacity availability in the right location. This can be caused by a lack of overall deployed systems, or by unexpected maintenance windows on CDNs and ISPs that cause streaming QoE problems. At peak viewing times, any capacity problems are amplified.

D2C Streamers also report the knock-on effect when CDNs need to scale for live events or major VOD releases, which can include overloading mid-tier Caches and then Origins.

Tracing errors from the Player error code to an upstream error code is not simple. A 2001 error code on the Player might be caused by a 404 error at the CDN. But was the actual error at the CDN’s Edge Cache layer, or is it at the Origin, Packager, Encoder or even the Content Source? Multiple supplier and technical domains are traversed during the tracing process. The Common Media Client Data (CMCD) initiative led by the Consumer Technology Association (CTA) is working on metadata alignment and querying capabilities from the client to the CDN, which could ultimately support faster diagnosis and resolution of errors as well as prevent errors.

Given that 60% of all QoE problems are attributed to CDN and ISP networks, CDN suppliers have a major role to play in accelerating error diagnosis and issue resolution. Error codes in the CDN generally relate to general HTTP codes (e.g., 4xx, 5xx). Understanding the error starts with investigating Edge and Origin performance, looking for issues in speed of delivery and system availability. Slow delivery speeds at the Edge are either isolated, which points to problems at the Edge, or they are correlated to slow speeds at the Origin, which points to problems at the Origin, potentially including packager and encoder. If the Edge is performing to spec, which normally means delivering all segments in the tens of milliseconds range, then there is likely to be an ISP problem, including customer premise equipment.

Feedback from D2C Streamers shows that the ISP relationship is becoming an increasingly important focus in day-to-day operations. While this relationship was not a priority in the past, today the largest Streamers will focus on closer links with the leading ISPs in a market to improve the viewing experience for their shared consumers.

Philippe Tripodi, Chief Product Officer at MainStreaming, is responsible for working with customers on resolving CDN-side issues. “Troubleshooting and resolving QoE issues is so critical for our streaming customers, especially those who deliver important live sports events. We see that there are two ways for D2C Streamers to work with CDNs. First, as a ‘black-box’ which gives no insight into what happens to streams between Origin and Edge. Or second, in a transparent way (that is often related to a Private CDN model) that gives full evidence of performance to the D2C Streamer and introduces the opportunity for the D2C Streamer to control what happens to their streams. Working with CDNs in this transparent way also leads to a deeper understanding of ISP-level performance that can be addressed in partnership between Streamer, CDN and ISP. We believe that D2C Streamers need transparency, both in real-time during live streaming and afterwards in post-event reports, because this gives them control of their own destiny as they work hard to deliver content perfectly to their customers.”

Figure 1: MainStreaming Edge Response dashboard showing % of all Edge responses with Error Codes (4xx & 5xx) which is provided to customers to support their performance improvement efforts.

Figure 2: MainStreaming Origin Response dashboard, which is provided to customers, showing times where a slow Origin response could have created onward delivery challenges and potentially be related to QoE issues.

What Are The Thought Leaders Thinking?

App developers focused on video services need better and better tools. They need to do their job so well that App-related quality issues become minor. The 25% of total errors should reduce in this way.

There is an opportunity to bring Player and Analytics components closer together, and to use machine learning from the Analytics to make automatic changes in the Player. For example, if a viewer has poor QoE because their network profile is different to how the Player has been configured, then the Player’s ABR could be corrected automatically. If the internet isn’t as good as it should be, the buffer could be enlarged. Most configurations today are static, and do not even consider the different network conditions. Automating simple configuration changes could assist here. This can address some of the 75% of errors from the Origination & Delivery environment.

Another opportunity is to improve information sharing between the player, analytics and CDN. The CMCD work mentioned earlier will assist this, and it will also address the 75% bucket of errors.

Player vendors note that the pre-launch testing performed today should better represent the real world in terms of load. Network bandwidth shaping tools that can represent a consumer’s real environment at home is useful. Streamers do tests with a subset of viewers in the real world. This real-world environment cannot be recreated in a test lab, but we should be able to.

D2C Streamers highlight that they need to keep focused on improving the visibility into network performance to truly tackle the 75% bucket of errors. In the absence of deeper partnerships with CDNs and ISPs, the next-best approach is to pursue a multi-CDN strategy to spread the risk of capacity shortages.

The Wrap-up

Streaming is aiming for broadcast-grade performance. But there is still a way to go to address the QoE issues seen at the Player. The network side problems are potentially more serious because they anecdotally represent more of the causes of problems and are harder to resolve by individual D2C Streamers. But there are initiatives that will help the industry address these issues, and some companies are trying to give more visibility of performance and create deeper CDN-ISP partnerships so D2C Streamers can better control their own destiny with their customers.

Poor technical quality is “only” 6% of the reason for customer churn, but it makes a significant impact. QoE issues can turn customers off completely, as shown by Vimeo’s analysis, or at least reduce their level of engagement with the OTT service. In a competitive media market, 6% could be the difference between winning and losing.

Creating a more joined up streaming supply chain is the right strategy to address the 6% churn rate. D2C Streamers should be able to see all aspects of the end-to-end video delivery chain so they can make decisions about where to invest to improve quality of delivery, to ultimately reduce their cost of poor quality.

You might also like...

The OTT Lexicon For 2023: Part 1 - Is Our Terminology Getting A Bit OTT?

The world of streaming is defined by acronyms like SVOD, AVOD, FAST, OTT and more. But this leaves gaps and confusion in what is included in our OTT services. For example, what does a service like BBC iPlayer include? What…

Compression: Part 4 - Introducing Motion Compensation

Here we introduce the different types of redundancy that can be located in moving pictures.

Vendor Spotlight: Hitachi Kokusai Electric Comark

Hitachi Kokusai Electric Comark marks half century of serving the broadcast community.

Sustainability Of Streaming: How Does OTT Compare With OTA? - Part 3

Parts 1 and 2 in this 3-part series analyzed the latest information about OTT and DTT energy consumption in the UK in the year 2021, concluding that there are important energy efficiency improvements to make in OTT, and some big decisions coming our…

Waves: Part 4 - Complex Numbers

In this part we look at the most elegant way of finding and defining a sine wave.