The Business Cost Of Poor Streaming Quality

Poor quality streaming loses viewers at an alarming rate especially when we consider the unintended consequences of poor error reporting on streaming players.

It is safe to say that probably every streaming service has delivered poor quality experiences to its viewers at some point.

At the same time, technical standards in streaming have been improving since the first D2C Streamer services started to emerge over 15 years ago. But as devices, networks, and video production continue to evolve, and particularly as audience sizes and expectations increase, poor viewing experiences persist, and some make headlines. So, what is the cost of poor quality? And what is being done about it?

What Do We Mean By Poor Quality?

Poor quality in this article means a poor quality of experience with video playback. In other words, anything the Player cannot play perfectly is a technical quality issue. This includes start-up failures, in-stream failures, rebuffering, and slow start-up times.

According to Vimeo research, technical issues cause 6% of churn for D2C Streamer services. For these services, the average ARPU is $15 per month, the average subscription length is 17 months, and the average lifetime value of a customer is $250. Clearly this type of reason for churn should be avoided. There are already other important topics that cause higher levels of churn caused by the content itself and the price of the service. D2C Streamer services do not need to add technical challenges to the list.

The Cost Of Poor Quality

Three lines in the Profit & Loss accounts link directly back to poor quality – loss of subscription revenue, loss of advertising revenue, and the extra cost of fixing issues.

Recent news headlines from high profile events that are now primarily available on streaming services have highlighted refunds being paid out for poor quality of experience (QoE) caused by technical issues. Whether a stream would not start due to user authentication or digital rights management problems, or whether it suffered from consistent rebuffering, the bottom line is that viewers will only put up with a certain level of inconvenience.

Player and Analytics solution provider Bitmovin extended Vimeo’s initial research. Their data analytics research of “best in class SVOD services” showed that a typical SVOD customer attempts 150 plays per month and has an average of 1 technical error every 15 plays.  With an average subscription length of 17 months, the average user has 165 technical issues before churning. 6% of the total customer base churns for technical reasons. They concluded that for every 10% reduction in technical issues there would be a 1.1% increase in total lifetime value from the customer base. If 1 million customers pay $15 per month, that is worth $160,000 per year in additional revenue.

Advertising-supported OTT services have a different dynamic, but still are affected by the impact of technical errors. When the D2C Streamer’s goal is to maximize the viewing time of a viewer, any technical errors work against this objective. Research published by ResearchGate points to a 2.32% reduction in viewership for people who experience technical errors.

D2C streamers report the extra pressure on QoE when the aim is to monetize the content through subscriptions. Paying subscribers expect excellent quality, especially those watching on larger screens where resolutions are higher, so tools like client analytics and network measurement devices become very important.  Monetizing through ads can put extra pressure on the delivery infrastructure to perform well (e.g., with dynamic ad insertion workflows), which in some cases can cause unintended issues with content delivery and the overall customer experience.

D2C Streamers also report that consumers will stop watching when their technical QoE is poor. One leading D2C Streamer measured that viewers watched programmes for 5-10% longer when QoE metrics were better. This made a significant difference to their business for attracting advertisers and retaining customers.

Beyond the headline-grabbing hits to subscription and advertising revenues, there is the more hidden cost of finding and fixing the technical issues. But this is not only an in-life customer service issue, where some technical issues drain hours and hours of time from technical teams. This is a product-launch app development issue. Because OTT video operates in a dynamic environment with many different devices and networks, and because the standards and specifications in these environments are not defined specifically for perfect video delivery, there is a continuous process of improvement and adaptation in the App development environment, where “the rubber meets the road” for video playback.

The cost of poor quality for a D2C Streamer is very hard to calculate. But the reputational damage can be clear. OTT-only services with subscription-based models attract very visible criticism in the market if the standard is poor, with QoE and App stability as the top two technical drivers of complaints.

Understanding Technical Issues

Understanding technical issues can be very challenging. Error codes are the start of the problem. Bitmovin reported that only 15% of total technical errors are clear, which leaves 85% unclear. The 85% is broken down to 20% that are completely unclear and 65% that are ambiguous.

Ambiguous errors relate to general error codes like Android’s ERROR_ CODE_DRM_UNSPECIFIED error.  This is an unspecified error related to DRM protection which needs more investigation to determine why the error occurred. Additional information on top of the error code and short description are important to support fast diagnosis and resolution of the error. The unknown errors do not provide any information about the root cause, so it is critical to have good reference documentation.

A video player can observe multiple technical error codes per hour. Some can cause noticeable QoE issues, like DRM errors (e.g., failed license requests), unrecoverable network errors (e.g., timeouts), and source errors (e.g., empty segments or decoding errors), while others may not, such as advertising errors and incorrect configurations. An advertising error may not be a serious impact to the viewer QoE, although it has commercial implications for the D2C Streamer and Advertiser. For some errors the Player will retry 2-3 times to play the video segment. The viewer may not even notice these errors, and perhaps they only experience a longer start-up time which is less impactful than a failure. In other cases, the Player may skip one of the video renditions (i.e., ABR bitrates) which cannot be played, which the viewer may not notice either. The key is to know which technical errors impacted a viewer’s experience, and by how much, in order to correctly prioritize the investment to fix them. 

Figure 1 – Root cause analysis workflow, starting from the observed error on the Player.

Figure 1 – Root cause analysis workflow, starting from the observed error on the Player.

All error types need drill-down capability to pinpoint the exact technical issue that can lead to a resolution. There is a specific drill-down process to follow as shown in Figure 1, beginning at the Player level where the error is first observed.

From a Player perspective, it is possible to see all the errors. But only some errors can be fully understood at the Player. Some can be resolved, and others can be worked around. But it is not possible to understand and resolve all the errors. The whole OTT video supply chain must be analyzed to understand and resolve all types of technical errors.

The launch of a new version of an App or website is often directly linked to new revenue streams, QoE improvements, or cost reductions. Therefore, speed of launch is key to achieve these results quickly. But a fast “buggy” launch is not in anyone’s interest. Intigral, a leading D2C Streamer operating in the Middle East region, recently reported that it was able to achieve a 30% reduction in time to market for new App releases, based on their ability to better trust the quality assurance processes through high quality analytics and error reporting during development. To the list of the cost of poor quality, we can therefore add the opportunity cost of slow App releases.

Associating a real business cost, like lost revenue, with a technical error will support the prioritization of investments that protect D2C Streamer revenues and profits. But where the errors occur is a key part of what sort of investment needs to be made. 

Preventing And Fixing Problems

The impact of poor QoE is significant enough to make it a high priority for resolution. It is why the industry talks about making streaming “broadcast-grade”. Good and consistent picture quality and low latency are the basic requirements to deliver. So how do we fix problems quickly and prevent them from happening in the first place?

Errors At The Player

Much like a Playout Automation environment that is often the first place to start looking for Linear TV Playout problems, in Streaming the first place to look is the Player. Player vendors are generally involved in helping their D2C Streamer customers to troubleshoot errors and identify the solutions. But understanding viewer QoE issues and assuring excellent performance remains one of the biggest challenges because of the lack of holistic tools for developing and testing video streaming solutions.

Various error types are found at the Player. Typical examples are missing DRM initialization data, manifest misconfigurations, audio track misconfigurations, and ad playback problems.

During the pre-release development phase, it is common to see playback misconfiguration as an error. As Developers work with different content types, operating systems, and devices there is a long list of playback parameters to set correctly and optimize. One common challenge is that viewing devices behave differently. For instance, the content browsing functionality on a specific Smart TV versus a specific mobile device can operate very differently, and errors can easily occur if the App and Player are not set up properly. For this reason, Players can include dedicated modules for specific devices.

D2C Streamers report that new App releases are a source of high risk to QoE and lead to a number of errors. Unfortunately, App errors are “own goals” in a competitive market and should be avoided. A more sophisticated analysis and testing environment makes an important difference to reducing the number of errors in the days following a new release.

Documentation is continuously being improved by leading Player suppliers to support their customers’ App Development teams to understand configuration issues. Templates are provided of known good configurations by device type. Error codes are improved to be more descriptive and more precise. For example, researchers report that they have invested heavily in improving error code descriptions and updating documentation for Streaming App developers. Instead of just a “2009_ DRM_KEY_ERROR” code, the code is described to now provide helpful guidance in plain language – “you can ask your DRM vendor to try verifying the init data is valid and not malformed. Most commonly the PSSH boxes need to be confirmed”.

Some errors have not always been flagged by analytics systems as QoE errors, such as audio lip-sync issues. But as viewer QoE is analyzed forensically to try to remove all types of technical error, the analytics systems are categorizing and reporting errors more precisely. From a cost of poor quality perspective, the ability to proactively understand the customer’s experience from the operational KPIs is critical. Customer experience professionals know that only a small percentage of dissatisfied customers will care enough to express their dissatisfaction to their service provider – most will just silently dis-engage. It is therefore business critical to know the customer’s experience without them needing to describe it.

To better understand potential QoE issues and fix them during the development phase requires holistic analytics solutions for developers. Headline performance must be understandable, coupled with drill-down into individual session-level detail. Customer journeys should be traceable, showing how viewers interacted with the App or website, discovered content, started streaming, stopped streaming, and exited the App. The Development team, who are responsible for perfecting this journey for all viewers, must be presented with this information so they can work efficiently and intelligently to eradicate technical errors and assure success of their OTT service.

However, based on D2C Streamer feedback, there is no substitute for real-world testing using the D2C Streamer’s own content and service environment. To find Player-side errors, the full range of consumer platforms must be tested, which needs a lab environment at least with every major platform type. Most major Streamers work with 12 different platforms.

A lab like this supports pre-launch testing but is also necessary to replicate customer reported issues for many non-specific errors. Leading Player vendors highlight that if a QoE customer issue is reported then it is important to replicate the viewer’s setup as accurately as possible to create a performance baseline. This includes testing using matching device models and shaping a similar network bandwidth. This ensures the troubleshooting process results in a standard of delivery which is achievable given the viewer’s environment and reduces time to issue replication.

Some errors can be fully resolved at the Player, while others can be worked around while a more permanent solution is implemented at the error source. Playback compatibility with a particular browser or platform is managed at the Player. Errors that originate upstream of the Player can, at best, be managed with a workaround at the Player. Some environments are easier to work with than others – for example, the Player can do manifest manipulation for Android devices, but not for Apple devices. But Player-side fixes for upstream errors can often be non-standard solutions, which introduces risk to the Player’s stability and ongoing maintainability.  Overall, video streaming Developers need to think about what they can change easily and quickly, especially if they are working to prevent future QoE issues as quickly as possible. In short, websites and players are easier to change for an App Developer than hardware encoders, ad servers, and CDNs.

Errors Upstream Of The Player

As reported by D2C Streamers interviewed for this article, approximately 75% of streaming problems must be fixed upstream of the player. Upstream means the “Origination & Delivery” environment. Within this, estimates are that 90% of the errors relate to “CDN/ISP networks” and 10% are from “source/encoding” environments. Anecdotally, most QoE problems relate to buffering.

Content delivery issues in CDNs and ISPs are very challenging to manage. The problems witnessed by D2C Streamers generally point to a lack of capacity availability in the right location. This can be caused by a lack of overall deployed systems, or by unexpected maintenance windows on CDNs and ISPs that cause streaming QoE problems. At peak viewing times, any capacity problems are amplified.

D2C Streamers also report the knock-on effect when CDNs need to scale for live events or major VOD releases, which can include overloading mid-tier Caches and then Origins.

Tracing errors from the Player error code to an upstream error code is not simple. A 2001 error code on the Player might be caused by a 404 error at the CDN. But was the actual error at the CDN’s Edge Cache layer, or is it at the Origin, Packager, Encoder or even the Content Source? Multiple supplier and technical domains are traversed during the tracing process. The Common Media Client Data (CMCD) initiative led by the Consumer Technology Association (CTA) is working on metadata alignment and querying capabilities from the client to the CDN, which could ultimately support faster diagnosis and resolution of errors as well as prevent errors.

Given that 60% of all QoE problems are attributed to CDN and ISP networks, CDN suppliers have a major role to play in accelerating error diagnosis and issue resolution. Error codes in the CDN generally relate to general HTTP codes (e.g., 4xx, 5xx). Understanding the error starts with investigating Edge and Origin performance, looking for issues in speed of delivery and system availability. Slow delivery speeds at the Edge are either isolated, which points to problems at the Edge, or they are correlated to slow speeds at the Origin, which points to problems at the Origin, potentially including packager and encoder. If the Edge is performing to spec, which normally means delivering all segments in the tens of milliseconds range, then there is likely to be an ISP problem, including customer premise equipment.

Figure 2 – MainStreaming Edge Response dashboard showing % of all Edge responses with Error Codes (4xx & 5xx) which is provided to customers to support their performance improvement efforts.

Figure 2 – MainStreaming Edge Response dashboard showing % of all Edge responses with Error Codes (4xx & 5xx) which is provided to customers to support their performance improvement efforts.

Figure 3 – MainStreaming Origin Response dashboard, which is provided to customers, showing times where a slow Origin response could have created onward delivery challenges and potentially be related to QoE issues.

Figure 3 – MainStreaming Origin Response dashboard, which is provided to customers, showing times where a slow Origin response could have created onward delivery challenges and potentially be related to QoE issues.

Feedback from D2C Streamers shows that the ISP relationship is becoming an increasingly important focus in day-to-day operations. While this relationship was not a priority in the past, today the largest Streamers will focus on closer links with the leading ISPs in a market to improve the viewing experience for their shared consumers.

Philippe Tripodi, Chief Product Officer at MainStreaming, is responsible for working with customers on resolving CDN-side issues. “Troubleshooting and resolving QoE issues is so critical for our streaming customers, especially those who deliver important live sports events. We see that there are two ways for D2C Streamers to work with CDNs. First, as a ‘black-box’ which gives no insight into what happens to streams between Origin and Edge. Or second, in a transparent way (that is often related to a Private CDN model) that gives full evidence of performance to the D2C Streamer and introduces the opportunity for the D2C Streamer to control what happens to their streams. Working with CDNs in this transparent way also leads to a deeper understanding of ISP-level performance that can be addressed in partnership between Streamer, CDN and ISP. We believe that D2C Streamers need transparency, both in real-time during live streaming and afterwards in post-event reports, because this gives them control of their own destiny as they work hard to deliver content perfectly to their customers.”

What Are The Thought Leaders Thinking?

App developers focused on video services need better and better tools. They need to do their job so well that App-related quality issues become minor. The 25% of total errors should reduce in this way.

There is an opportunity to bring Player and Analytics components closer together, and to use machine learning from the Analytics to make automatic changes in the Player. For example, if a viewer has poor QoE because their network profile is different to how the Player has been configured, then the Player’s ABR could be corrected automatically. If the internet isn’t as good as it should be, the buffer could be enlarged. Most configurations today are static, and do not even consider the different network conditions. Automating simple configuration changes could assist here. This can address some of the 75% of errors from the Origination & Delivery environment.

Another opportunity is to improve information sharing between the player, analytics and CDN. The CMCD work mentioned earlier will assist this, and it will also address the 75% bucket of errors.

Player vendors note that the pre-launch testing performed today should better represent the real world in terms of load. Network bandwidth shaping tools that can represent a consumer’s real environment at home is useful. Streamers do tests with a subset of viewers in the real world. This real-world environment cannot be recreated in a test lab, but we should be able to.

D2C Streamers highlight that they need to keep focused on improving the visibility into network performance to truly tackle the 75% bucket of errors. In the absence of deeper partnerships with CDNs and ISPs, the next-best approach is to pursue a multi-CDN strategy to spread the risk of capacity shortages.

The Wrap-up

Streaming is aiming for broadcast-grade performance. But there is still a way to go to address the QoE issues seen at the Player. The network side problems are potentially more serious because they anecdotally represent more of the causes of problems and are harder to resolve by individual D2C Streamers. But there are initiatives that will help the industry address these issues, and some companies are trying to give more visibility of performance and create deeper CDN-ISP partnerships so D2C Streamers can better control their own destiny with their customers.

Poor technical quality is “only” 6% of the reason for customer churn, but it makes a significant impact. QoE issues can turn customers off completely, as shown by Vimeo’s analysis, or at least reduce their level of engagement with the OTT service. In a competitive media market, 6% could be the difference between winning and losing.

Creating a more joined up streaming supply chain is the right strategy to address the 6% churn rate. D2C Streamers should be able to see all aspects of the end-to-end video delivery chain so they can make decisions about where to invest to improve quality of delivery, to ultimately reduce their cost of poor quality.

You might also like...

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Designing IP Broadcast Systems: Addressing & Packet Delivery

How layer-3 and layer-2 addresses work together to deliver data link layer packets and frames across networks to improve efficiency and reduce congestion.

Next-Gen 5G Contribution: Part 1 - The Technology Of 5G

5G is a collection of standards that encompass a wide array of different use cases, across the entire spectrum of consumer and commercial users. Here we discuss the aspects of it that apply to live video contribution in broadcast production.

Designing IP Broadcast Systems: Integrating Cloud Infrastructure

Connecting on-prem broadcast infrastructures to the public cloud leads to a hybrid system which requires reliable secure high value media exchange and delivery.