Preventing the Next Big Live-Streaming Failure

The Mayweather-McGreagor PPV event was marred with technical issues, many of which might have been prevented with proper network testing.

If there’s one thing that became very apparent during the Mayweather-McGregor pay-per-view experience, it’s that delivering live streaming premium events to large-scale audiences is inherently difficult. And notoriously unpredictable.

The class action lawsuit against Showtime takes this to a whole new level, clearly demonstrating that it’s not just about providing a good video experience - it’s about viewers missing out on a major bonding event that millions of people had been eagerly awaiting. Paying a lot of money for content that doesn’t arrive on time and in good shape obviously doesn’t sit well with viewers. After all, it’s not like a movie, there are no second chances, and no one is willing to watch it the following day.

In an ideal world, video arriving on time and intact would be a given. In a not-so-ideal world where things do go wrong, customer experience management is key. The challenge is knowing who was affected by which issues, and proactively working with those customers to mitigate those issues. Customers would like notification while a short-term problem is being solved, or in the worst case, they’d like an organization to proactively reach out so their experience and frustration isn’t exacerbated by having to chase a refund.

For those not so familiar with “how” video is processed and delivered (and there’s a very good argument to say you shouldn’t need to know in the slightest), it’s all about creating good quality content, and then getting video to where it needs to be, complete and on time. This sounds simple, but is actually incredibly complex, and involves many companies in the end-to-end delivery chain. There are the companies that produce and “shape” the content (picture quality, resolution, audio formats, ad-insertion markers, subtitles), and those that deliver the network packets containing the content. For live streaming, these companies are the national or international content delivery networks feeding into the broadband access networks such as cable, DSL, fiber, Wi-Fi or 3G/4G mobile.Mostly (but not always), these are all different companies.

The challenge of figuring out where the video feed is going wrong is highly complex, and takes a combination of passive monitoring (watching “on the wire”) and active testing to test the availability of the streams in different regions. Ideally, we want to provide an early warning system for video problems such as bad picture quality, accessibility errors, buffering and outages. This includes testing immediately after the content is produced and packaged, and then periodically at multiple geographic locations after it leaves the content delivery networks (in data centers, on premise or in the cloud). Sampled coverage testing at the edge of access networks, whether broadband cable, Wi-Fi or cellular must also be part of the matrix.

Active Testing Keeps Everyone Honest

There are many reasons for active testing, with the most obvious being that you know there is an issue before your customers (or at the least at the same time) – social media lights up very quickly for premium content issues. Other reasons include checking availability, start-up time and latency before and during a broadcast (especially for premium content/events), geo-locking/availability, authentication, encryption, and continuous sampled testing at different geographical locations on different networks during the event.It also makes sense to test after configuration changes (which are happening all the time whether you know it or not), and certainly testing when new protocols, resolutions or network upgrades are instigated. If it’s not clear by now, it basically means you need to test continually – there is always something changing that you won’t be aware of, even if you own the delivery networks. Test, and then keep testing.

Checking video availability at many geographical locations is key for several reasons:

Know where the issue originated (was it the preparation stage, or a third party CDN or access network?)
Being able to mitigate future brand damage if the issue was not with the content provider.
Having the data allows for negotiation with the CDN and access network providers – the worst possible situation is not knowing what to fix for the next major event.

Customer experience management

If consumers in a particular region or on a particular access network are hit by poor service, but the content provider is providing perfect packaged content into the network, then who is to blame?Ultimately the content provider brand is damaged, and, as they are the company charging the pay-per-view fee, they are the ones that receive the wrath of the customers. The next question is compensation. How much was the customer affected? Did they get a few “annoying” glitches caused by a particular issue in the delivery chain, or was the content totally unwatchable due to a major failure? Compensation needs to be appropriate.

Analytics coming from the end playing device will provide intelligence on the viewing experience and potentially which networks the consumer was connected to, but this needs to be complemented with continuous monitoring immediately after the content preparation process, and then proactively testing the stream availability (at all bit rates) at multiple positions along the delivery chain.

The ultimate solution and protection comes from combining the operational monitoring from the video delivery chain with experience analytics from the end client devices. Being able to map (in near-real-time) a video impacting event to the actual viewing behavior, as a result of that event, provides unprecedented impact analysis potential. This is also the tip of the iceberg for customer experience management executives who want to then dive deeper and map the affected customers to their internal databases to identify the high-revenue and VIP customers who were affected. Focusing on these customers has a positive twofold effect – it puts a focus on premium customers generating most of the revenue, and that, in turn benefits additional customers that may suffer issues under lower loads, that are notoriously difficult to track down.

The potential benefits of linking the operational analytics with the client analytics is huge, and initial deployments are starting to happen now. As the impact and root-cause knowledge becomes greater and more real-time, the case for more automation and self-healing video networks also becomes stronger. The good news is that this is also happening due to advances in the use of dynamic orchestration for cloud and virtualization (especially network functions virtualization (NFV)). As functions in the delivery chain become virtualized, they are evolving to have advanced control and configuration capabilities, as well as new scaling capabilities leaning towards micro-architecture-based services. Next-generation video services will be able to “spin up” an encoder on demand for a new channel, or as a failover mechanism for an existing channel. It’s also possible to dynamically deploy the necessary monitoring needed as a micro-service, so the orchestration system knows in real-time when something is wrong and can take the appropriate action. In reality, this means that real-time monitoring is becoming an essential part of the system – you can’t take corrective action unless you know what’s happening. Driving self-healing services from Twitter feedback is not really practical.

As these solutions are linked together, we will be able to rapidly migrate to a world where it’s possible to dynamically deploy monitoring as needed across the network for more and more precise root cause analysis, and then instigate appropriate corrective action. The trigger source could be the existing monitoring system, artificial intelligence (AI) from cloud-based client analytics, or a trigger from equipment or a virtual function in the network itself.Whatever the trigger mechanism, the ability to be able to diagnose root cause and analyze impact severity in near-real time will be a major factor in not only detecting, but also dynamically repairing video delivery problems in future.

The technology and capability to solve these challenges is here, and if the industry can work together to enable more transparency and interaction across the video delivery chain, we will be able to avoid, or at least rapidly mitigate, premium event problems for future viewers.

Stuart Newton, VP Strategy & Business Development, IneoQuest, a Telestream Company.

You might also like...