Part 1 of this series described how network-side QoE (Quality of Experience) measurement is fundamental to proactively assuring the quality of OTT services. At its core, the network-side can be an early warning system for QoS, which in turn correlates to actual QoE performance. This article considers the two types of network monitoring available to us, relative priorities for the points of measurement, and how the video platforms contributing to OTT services are evolving to support OTT quality at scale.
The Measurement Method
Network-side measurement of content, delivery and QoE performance is either through service monitoring which focuses on an individual or small subset of the audience, or platform monitoring which focuses on all inputs and outputs from a particular technical component such as a CDN or encoder.
Service monitoring uses active testing, where probes simulate OTT clients. This is typically the most widely used measurement method, because it is most cost-effective for smaller, targeted sample testing and is generally able to identify most quality issues. Service monitoring can determine whether, for example, all ABR profiles can be streamed successfully to a particular device type, or if the latest VOD content can be streamed from each CDN, or if the queued-up ads are ready to be played during the break. Because active testing draws content to a client device, a side-benefit of the tests is that caches can be pre-warmed with the tested content, which is particularly helpful if multiple caches in the CDN can all be populated with the necessary content from the test activity.
Service monitoring can be targeted or broad. It can focus on single live streams for the duration of an event, or it can focus on an entire VOD library. It can simulate 20 end customers, or 20,000. The deployment depends on the budget and value of the content, but the flexibility exists to cater to a wide range of requirements with today’s cloud-based SaaS solutions. Monitoring processes can be expanded or contracted according to the content and audience locations and paid for accordingly. Data analytics, required in real-time to be of most use to the OTT operator, can be accelerated by elastic computing.
Service monitoring for live events generally involves monitoring of all stream variants continuously, as any Playout MCR demonstrates. But in OTT it is not enough to monitor the streaming output of the live encoder or the origin to confirm all required bitrates and packages are streaming as expected. The output to the client devices is where the quality issues manifest once the streams have passed through the various network paths. So, in the absence of direct network control, or real-time stream-level reporting from CDN suppliers and the ISPs, or sufficiently scalable external monitoring tools, OTT operators naturally relied upon what they could control – client-side monitoring. But this leads to the conclusion, as mentioned before, that troubleshooting root cause or proactively assuring quality is not possible.
“There is a clear shift underway from linear delivery to OTT delivery,” states Anupama Anantharaman, VP Product Management at Interra. “The need to provide great video quality and user experience on the complex OTT platform, across different geographies has caused new quality assurance strategies to emerge. Operators want intelligent error correlation and troubleshooting tools, and flexible monitoring, from deep, persistent monitoring for content quality at the encoder and origin server to lighter, delivery-specific checks at the CDN and edge points. Cloud-based monitoring solutions offer solid advantages and are evolving to meet these needs.”
Like most troubleshooting activities, a lack of root cause diagnosis reaches a point where it becomes necessary to see 100% of the platform for extended periods of time to know what is truly happening. Platform monitoring, fulfilled through passive monitoring by tools external to the video delivery platforms, can meet this need.
This external platform monitoring method is intensive and relies on collaboration with the platform owners to insert line taps to see what is “on the wire”. This can also become expensive, but sometimes it is the only way to resolve a persistent issue.
External platform monitoring is made more complex by the distributed nature of OTT delivery networks. A single CDN could have tens or hundreds of edge cache servers all contributing to the delivery of streams. Or there could be 20 different ISPs contributing to the final delivery over their networks. There are multiple multi-tenant platforms working together to deliver OTT video – CDN, IXP, ISP, Access Network, and home routers. Often, external platform monitoring has to be focused in on the most critical network junctions, like an Origin interfacing to multiple CDNs.
Internal platform monitoring is provided by the platforms themselves, like a CDN or an ISP. Because these platforms are often multi-tenant and based on total compute / storage / network performance, the internal monitoring activity is generally focused on availability – i.e., if the infrastructure is operating within tolerances, then it is healthy. But this can hide a plethora of issues with video quality and QoE.
One of the main quality challenges is to consistently sustain a stream’s bitrate. Bitrates can fluctuate due to congestion in the CDN, congestion in the ISP networks and peering networks, the impact of many devices simultaneously requesting the same content from the Origin, the nature of the access network technology (e.g., ADSL vs. FTTP), and more. ABR was invented to deal with the fact that sustaining a bitrate over the internet is an almost impossible task. But stepping up and down the bitrate ladder is not an ideal customer experience, and for high-performance streaming to paying customers ABR is not the final solution. There is a need to solve this issue as far as possible, given the myriad of potential hurdles a stream can face on its way to a device. This QoE issue can be understood in detail by network-side measurement tools.
CDNs, which represent the last video-specialised environment in the delivery chain before the consumer device, are evolving to give this stream-level QoE data to their OTT operator customers. As a rule of thumb, if the quality of the stream meets QoE specifications at the egress point from the CDN Edge Cache it is most likely that the consumer will be happy. As the final point of delivery into the last-mile network, this works much the same way as the head-end for over-the-air broadcasting.
Considering the order of priority for service assurance monitoring today, the below diagram indicates priority 1 for active testing of encoder output (including file transcoders for VOD assets), origin output (which can be multiple points of output) and CDN egress. Already this raises the bar on what most OTT operators see because CDN egress is not often available in the appropriate granularity. Note however, that content quality monitoring specifically if done post-encryption requires de-encryption to visualize the asset. While eminently achievable, and most often needed in workflows with combined transcoder-packager functions that cannot be monitored post-transcoder, this requires an extra level of integration with the DRM supplier as opposed to the delivery and QoE quality monitoring which can be done simply on the packaged and encrypted streams.
Priority 2 is the egress from the edge of each access network, which is a bigger task and currently requires sample-based service monitoring or a deep relationship with an ISP. Priority 2 also includes continuous passive monitoring of the Origin egress, which is often required when active testing of origin and/or CDN egress does not lead to a clear diagnosis. Congestion, request time-out, and load balancing configuration issues on the Origin are more complex problems to understand and often require passive monitoring.
Measuring is one thing. Performing real-time analysis to create actionable insight is another. Typically, monitoring is against agreed tolerances such as a minimum average bitrate across all streams from a single CDN. Thresholds are pre-defined based on the known consumer ecosystem served by the OTT operator. Alarm thresholds are normally set to be as proactive as possible, to report signs of degradation that need to be addressed, rather than waiting reactively for outages.
Raising an alarm against a threshold is straightforward. Correlating network-side service degradation with customer dissatisfaction is more complex. But this is the standard that must be achieved for OTT operators to assure quality. Advanced solutions today can correlate leading (as opposed to trailing) performance indicators in real-time across various network domains, including data supplied by the client-side monitoring domains. The evasive action often relies on network re-routing to avoid the problematic routes for the future streams, which today means that client-side monitoring tools re-direct streams to another CDN. But is there a better way?
“Highly evolved OTT operators are often frustrated by a lack of visibility into their CDN services,” states Philippe Tripodi, COO-Product at MainStreaming. “CDNs should be transparent in order to support OTT operators’ quality assurance efforts. By adding CDN data to holistic service monitoring tools, the OTT operator can have a deeper understanding of the reasons for each customer’s experience. And if CDNs incorporate data from downstream network domains into their own stream management algorithms, they can make better decisions about how to manage the end customer’s QoE.”
The Future Of Network-side QoE Monitoring
Quality assurance is a never-ending task. New content formats, dynamic networks, changing devices and evolving customer expectations mean that OTT operators need quality assurance to be a core competence.
OTT operators need to have holistic service monitoring tools to understand the overall service they are providing to their audience. The good news now is that SaaS solutions and cloud deployments make this cost effective and easily deployable. Live streams, VOD streams and VOD assets can all be routinely sampled with measurement rules defined ever more precisely over time. OTT operators can really know their stream health and customer QoE.
The partnership between the OTT operator and their monitoring systems, plus their CDN and ISP platform partners, will provide the ability to understand QoE from a network perspective. Today, CDNs are the primary video-centric platform partner to rely on, as they look upstream to the Origin and downstream to the ISPs. Their final touch and view of the video before it is received by the consumer, and their ability to act and proactively move a stream to a new network path is fundamental to excellent QoE. Already, CDNs work with ISPs but this relationship will evolve to become much closer and more proactive, particularly for the largest streamers delivering to the largest audiences.
We are on a path towards OTT at scale with many millions of concurrent streams from single OTT operators to their audience. Some OTT services already reach this size on a regular basis, but in the coming years many more OTT services will. Quality control is already a very real issue for many OTT operators, and it will only become more important.
Broadcast Bridge Survey
You might also like...
Italian telco TIM has deployed Android TV set-top boxes supplied by Technicolor, giving access to streaming services such as Netflix, Amazon, Infinity, Disney+ and DAZN, as well as traditional linear TV.
In the last article in this series, we looked at how PTP V2.1 has improved security. In this part, we investigate how robustness and monitoring is further improved to provide resilient and accurate network timing.
The decline of public service broadcasting has been one of those long running narratives that is sometimes defied by reality, like the death of the set top box.
Violent weather storms are wreaking havoc on the East Coast of the U.S. and radio and TV stations there are struggling to get the life-saving news out. In the past two months alone storms have knocked out TV antenna…
Timing accuracy has been a fundamental component of broadcast infrastructures for as long as we’ve transmitted television pictures and sound. The time invariant nature of frame sampling still requires us to provide timing references with sub microsecond accuracy.