OTT Monitoring From The Network Side

At its core, the network-side can be an early warning system for QoS, which in turn correlates to actual QoE performance.

This article considers the two types of network monitoring available to us, relative priorities for the points of measurement, and how the video platforms contributing to OTT services are evolving to support OTT quality at scale.

The Measurement Method

Network-side measurement of content, delivery and QoE performance is either through service monitoring which focuses on an individual or small subset of the audience, or platform monitoring which focuses on all inputs and outputs from a particular technical component such as a CDN or encoder.

Service monitoring uses active testing, where probes simulate OTT clients.  This is typically the most widely used measurement method, because it is most cost-effective for smaller, targeted sample testing and is generally able to identify most quality issues. Service monitoring can determine whether, for example, all ABR profiles can be streamed successfully to a particular device type, or if the latest VOD content can be streamed from each CDN, or if the queued-up ads are ready to be played during the break. Because active testing draws content to a client device, a side-benefit of the tests is that caches can be pre-warmed with the tested content, which is particularly helpful if multiple caches in the CDN can all be populated with the necessary content from the test activity.

Service monitoring can be targeted or broad. It can focus on single live streams for the duration of an event, or it can focus on an entire VOD library. It can simulate 20 end customers, or 20,000. The deployment depends on the budget and value of the content, but the flexibility exists to cater to a wide range of requirements with today’s cloud-based SaaS solutions. Monitoring processes can be expanded or contracted according to the content and audience locations and paid for accordingly. Data analytics, required in real-time to be of most use to the OTT operator, can be accelerated by elastic computing.

Service monitoring for live events generally involves monitoring of all stream variants continuously, as any Playout MCR demonstrates. But in OTT it is not enough to monitor the streaming output of the live encoder or the origin to confirm all required bitrates and packages are streaming as expected. The output to the client devices is where the quality issues manifest once the streams have passed through the various network paths. So, in the absence of direct network control, or real-time stream-level reporting from CDN suppliers and the ISPs, or sufficiently scalable external monitoring tools, OTT service providers naturally relied upon what they could control – client-side monitoring. But this leads to the conclusion, as mentioned before, that troubleshooting root cause or proactively assuring quality is not possible.

“In today’s dynamic media landscape, where viewers have high expectations, monitoring video streams has become indispensable,” states Anupama Anantharaman, VP Product Management at Interra. “New trends and advancements, such as the shift towards all-IP infrastructure including the ST 2110 standards, FAST channels, etc. have further emphasized the need for comprehensive video monitoring. By implementing end-to-end monitoring solutions, broadcasters and OTT operators can closely monitor the entire content delivery chain for audio-video quality, closed captions, and Ad insertions from encoding and transcoding to CDN distribution and playback.

Automation is playing a pivotal role in making video monitoring and root-cause analysis more efficient. E.g., Automated video quality tracing and the detection of anomalies or deviations in key performance indicators (KPIs) can effectively prompt alerts or notifications, initiating further investigation to ensure a timely response to potential issues before they adversely affect the viewer’s quality of experience (QoE). Furthermore, the utilization of automation facilitates uninterrupted monitoring, enabling broadcasters and operators to oversee their services round-the-clock, devoid of manual intervention.”

Like most troubleshooting activities, a lack of root cause diagnosis reaches a point where it becomes necessary to see 100% of the platform for extended periods of time to know what is truly happening. Platform monitoring, fulfilled through passive monitoring by tools external to the video delivery platforms, can meet this need.

This external platform monitoring method is intensive and relies on collaboration with the platform owners to insert line taps to see what is “on the wire”. This can also become expensive, but sometimes it is the only way to resolve a persistent issue.

External platform monitoring is made more complex by the distributed nature of OTT delivery networks. A single CDN could have tens or hundreds of edge cache servers all contributing to the delivery of streams. Or there could be 20 different ISPs contributing to the final delivery over their networks. There are multiple multi-tenant platforms working together to deliver OTT video – CDN, IXP, ISP, Access Network, and home routers. Often, external platform monitoring has to be focused in on the most critical network junctions, like an Origin interfacing to multiple CDNs.

Internal platform monitoring is provided by the platforms themselves, like a CDN or an ISP. Because these platforms are often multi-tenant and based on total compute / storage / network performance, the internal monitoring activity is generally focused on availability – i.e., if the infrastructure is operatingwithin tolerances, then it is healthy. But this can hide a plethora of issues with video quality and QoE.

One of the main quality challenges is to consistently sustain a stream’s bitrate. Bitrates can fluctuate due to congestion in the CDN, congestion in the ISP networks and peering networks, the impact of many devices simultaneously requesting the same content from the Origin, the nature of the access network technology (e.g., ADSL vs. FTTP), and more. ABR was invented to deal with the fact that sustaining a bitrate over the internet is a difficult task. But stepping up and down the bitrate ladder is not an ideal customer experience, and for high-performance streaming to paying customers ABR is not the final solution. There is a need to solve this issue as far as possible, given the myriad of potential hurdles a stream can face on its way to a device. This QoE issue can be understood in detail by network-side measurement tools.

Figure 1 – Prioritized monitoring points for video stream quality.

Figure 1 – Prioritized monitoring points for video stream quality.

CDNs, which represent the last video-specialised environment in the delivery chain before the consumer device, are evolving to give this stream-level QoE data to their OTT service provider customers. As a rule of thumb, if the quality of the stream meets QoE specifications at the egress point from the CDN Edge Cache it is most likely that the consumer will be happy. As the final point of delivery into the last-mile network, this works much the same way as the head-end for over-the-air broadcasting.

Considering the order of priority for service assurance monitoring today, the below diagram indicates priority 1 for active testing of encoder output (including file transcoders for VOD assets), origin output (which can be multiple points of output) and CDN egress. Already this raises the bar on what most OTT service providers see because CDN egress is not often available in the appropriate granularity. Note however, that content quality monitoring specifically, if done post-encryption, requires de-encryption to visualize the asset. While eminently achievable, and most often needed in workflows with combined transcoder-packager functions that cannot be monitored post-transcoder, this requires an extra level of integration with the DRM supplier as opposed to the delivery and QoE quality monitoring which can be done simply on the packaged and encrypted streams.

Priority 2 is the egress from the edge of each access network, which is a bigger task and currently requires sample-based service monitoring or a deep relationship with an ISP. Priority 2 also includes continuous passive monitoring of the Origin egress, which is often required when active testing of origin and/or CDN egress does not lead to a clear diagnosis. Congestion, request time-out, and load balancing configuration issues on the Origin are more complex problems to understand and often require passive monitoring.

Evasive Action

Measuring is one thing. Performing real-time analysis to create actionable insight is another. Typically, monitoring is against agreed tolerances such as a minimum average bitrate across all streams from a single CDN. Thresholds are pre-defined based on the known consumer ecosystem served by the OTT service provider. Alarm thresholds are normally set to be as proactive as possible, to report signs of degradation that need to be addressed, rather than waiting reactively for outages.

Raising an alarm against a threshold is straightforward. Correlating network-side service degradation with customer dissatisfaction is more complex. But this is the standard that must be achieved for OTT service providers to assure quality. Advanced solutions today can correlate leading (as opposed to trailing) performance indicators in real-time across various network domains, including data supplied by the client-side monitoring domains. The evasive action often relies on network re-routing to avoid the problematic routes for the future streams, which today means that client-side monitoring tools re-direct streams to another CDN. But is there a better way?

“Highly evolved OTT operators are often frustrated by a lack of visibility into their CDN services,” states Sergio Carulli, Chief Innovation Officer at MainStreaming. “CDNs should be transparent in order to support OTT operators’ quality assurance efforts. By adding CDN data to holistic service monitoring tools, the OTT operator can have a deeper understanding of the reasons for each customer’s experience. And if CDNs incorporate data from downstream network domains into their own stream management algorithms, they can make better decisions about how to manage the end customer’s QoE.”

The Future Of Network-side QoE Monitoring

Quality assurance is a never-ending task. New content formats, dynamic networks, changing devices and evolving customer expectations mean that OTT service providers need quality assurance to be a core competence.

OTT service providers need to have holistic service monitoring tools to understand the overall service they are providing to their audience. The good news now is that SaaS solutions and cloud deployments make this cost effective and easily deployable. Live streams, VOD streams and VOD assets can all be routinely sampled with measurement rules defined ever more precisely over time. OTT service providers can really know their stream health and customer QoE.

The partnership between the OTT service provider and their monitoring systems, plus their CDN and ISP platform partners, will provide the ability to understand QoE from a network perspective. Today, CDNs are the primary video-centric platform partner to rely on, as they look upstream to the Origin and downstream to the ISPs. Their final touch and view of the video before it is received by the consumer, and their ability to act and proactively move a stream to a new network path is fundamental to excellent QoE. Already, CDNs work with ISPs but this relationship will evolve to become much closer and more proactive, particularly for the largest streamers delivering to the largest audiences.

We are on a path towards OTT at scale with many millions of concurrent streams from single OTT operators to their audience. Some OTT services already reach this size on a regular basis, but in the coming years many more OTT services will. Quality control is already a very real issue for many OTT operators, and it will only become more important.

Broadcast Bridge Survey

You might also like...

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

An Introduction To Network Observability

The more complex and intricate IP networks and cloud infrastructures become, the greater the potential for unwelcome dynamics in the system, and the greater the need for rich, reliable, real-time data about performance and error rates.

Next-Gen 5G Contribution: Part 2 - MEC & The Disruptive Potential Of 5G

The migration of the core network functionality of 5G to virtualized or cloud-native infrastructure opens up new capabilities like MEC which have the potential to disrupt current approaches to remote production contribution networks.

Designing IP Broadcast Systems: Addressing & Packet Delivery

How layer-3 and layer-2 addresses work together to deliver data link layer packets and frames across networks to improve efficiency and reduce congestion.

The Business Cost Of Poor Streaming Quality

Poor quality streaming loses viewers at an alarming rate especially when we consider the unintended consequences of poor error reporting on streaming players.