Machine Learning (ML) For Broadcasters: Part 7 - ML Automates & Extends Content Management

Machine learning and other aspects of AI are being used increasingly for various aspects of content management, including classification of assets and generation of metadata as the basis for future actions. ML is also being used on live content, especially user generated, for compliance and enforcing content moderation rules.

Content management has traditionally been labor intensive and therefore expensive as well as restricted in scope. Those constraints have become more visible as content has proliferated in the streaming era, but are being mitigated by the rise of AI, especially Machine Learning (ML), which is rapidly becoming essential for automating at least partially many of the associated tasks around classification and associated metadata generation.

As well as cutting costs, this is helping liberate archives and maximize the assets’ value through more effective targeting and recommendation, for example. Yet this is still very much work in progress, and the full scope of AI and ML has yet to be exploited by even the most advanced broadcasters and content creators in the field.

It is worth emphasizing that the categories of AI and ML application we have identified for this series do overlap and that is especially true for content management, which includes MAM (Media Asset Management) as a subcategory dealing with the direct administration of content libraries. Content management in its larger sense deals with aspects relating to those assets throughout their lifecycle including certain enhancements such as addition of metadata. In this sense the lines between content management and other categories such as production are rather blurred, and as a result some of the applications of AI and ML defy easy categorization.

As one example, ML is used increasingly for creation of sporting highlights packages quickly after the event, as during the recent FIFA World Cup. The role of ML here is to identify actions on the field of play likely to be of interest, in the case of football including those involving skill, attempts on goal or major incidents on the pitch. This then comes under the headings of both content management and production.

Content management is also notable for employing both the latest ML techniques and some traditional rule-based AI methods, sometimes combining them for a given task. Today machine learning based on various forms of neural network comprising hierarchies of nodes carrying weights that are tuned or “learnt” for specific predictive tasks, represents by far the most common form of AI. Indeed, they have become almost synonymous.

Yet before continuing advances in computational power helped bring on the era of ML, the AI field had been treading water for around two decades, under various names such as expert systems, rule-based AI, or symbolic AI. Many applications did indeed involve application of rules and identification of data objects such as images or video frames by symbols or objects contained within them.

This older field too has enjoyed a new lease of life alongside ML, and for content management has been applied in metadata generation for identifying objects in audio, such as certain keywords or phrases, and in video, such as objects like balls. It can then be extended in the time dimension across frames to identify actions such as waving, running or dancing. It can then hook up to ML to “learn” sequences associated with those actions.

So good old symbolic AI, applied to speech and image recognition, can be used to extract basic metadata information from any content in principle, such as key words, phrases and image objects. Machine learning then enables deeper more meaningful classification into content groups that can then be applied in recommendation and targeting. This could involve feedback from the distribution loop or even from social media about how popular the content has been with different demographic groups. The popularity of content can be diced by gender, age group, geography, and even by factors relating to individual users such as other known preferences. This information can then be used for targeting and recommendation.

For such video classification, content is sifted into various classes, on the basis of actions, movements, specific objects contained within it such as a given actor, or features extracted from metadata like genre. The ML model is fed video frames as input and the output is then the probability of each class being represented in the video. This could be several classes for some content, just one for others, and none for some.

Having worked on frames in isolation the model can then consider the spatio-temporal relationships between adjacent frames to identify those actions that cannot readily be divined from a single frame. That can then lead to a stronger association between the video and the various classes, the aim being to assign either high probabilities close to 100% of belonging to a given class, or low ones near 0%. In some cases, human inspection may be required to resolve uncertainty where the ML model is unable to assign a high enough probability of given video belonging to a particular class.

Quality control (QC) is also fertile ground for both ML and symbolic AI, having traditionally been a labor-intensive process that therefore had to be applied sparingly, with also limited scope for live content. This comes under two headings, firstly basic technical QC such as involving checks for compatibility with client devices and for anomalies that impact the viewing experience.

Then there is more subjective QC for assessing the higher-level experience, as might be expressed by MOS (Mean Opinion Scores). Even the first category of QC task was once performed manually but is ripe for symbolic AI because it involves application of rules to identify technical anomalies that can be extracted automatically from the content. The second more subjective QC is where ML comes in by matching content with training material assessed by humans and assigning quality scores accordingly.

UK commercial broadcaster ITV has applied AI to automate the formerly labour-intensive process of marking segments such as color bars in content.

UK commercial broadcaster ITV has applied AI to automate the formerly labour-intensive process of marking segments such as color bars in content.

Compliance, which can be regarded as an extension of QC, is also suitable for ML, especially the variant known as supervised learning where the model converges around specific data combinations defined by the user. It involves labelling data sets and training the model to classify outcomes that match these quite closely. Content compliance can require identification of specific scenes or events in video that might fall foul of regulations in a particular region, individual country, or even target audience segment. ML can identify the scenes and tag them such that they can be snipped out for those territories where they might cause offence.

Somewhat related to this is the field of segment marking, which has long been employed by broadcasters and content creators for various purposes, including cataloging and content repurposing. UK commercial broadcaster ITV has employed ML to mark 12 segment types in its content, including functional elements such as color bars, or the slates containing descriptions of content. It also includes more applied or creative segments like recaps, credits, program part segments, and break bumpers.

The break bumper, or just a bump, is usually a two to fifteen seconds voice over between a pause in the program and its commercial break, or vice versa, of use for ad insertion and search. Now the model recognizes segments with well over 95% accuracy, approaching 100% in some cases.

USA cable and media giant Comcast, which owns Sky and NBC Universal, is one of the few to go so far as to commercialize an internally developed AI application, although the company has form here through having been the original architect of the RDK operating platform. RDK has been adopted widely by major cable TV operators, especially in North America, as an alternative to Android TV. 

Comcast is one of the few large broadcasters and media groups not just to develop its own AI-based video analysis and classification system but also to make it available as a service.

Comcast is one of the few large broadcasters and media groups not just to develop its own AI-based video analysis and classification system but also to make it available as a service.

Comcast developed Video Artificial Intelligence (VideoAI) for generation of actionable metadata around content assets, to help manage new content, improve advertising efficiency, and streamline workflow generally. This was adopted by NBCUniversal and Sky, as well as Comcast’s own pay TV service in the USA, with use extending to tagging key onscreen moments such as hard cuts, black frames, and transitions, then being recast for marketing as a SaaS. It is along the same lines as the ITV segment marking system, and although wider in scope does indicate that early developers of AI systems in the content management area have potential for recouping investments by making them available, unless they see competitive advantage keeping the innovations to themselves.

You might also like...

Future Technologies: The Future Is Distributed

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with how distributed processing, achieved via the combination of mesh network topologies and microservices may bring significant improvements in scalability,…

Standards: Part 11 - Streaming Video & Audio Over IP Networks

Streaming services deliver content to the end-users via an IP network connection. The transport process is similar to broadcasting and shares some of the same technologies but there are some unique caveats.

Designing IP Broadcast Systems: Routing

IP networks are wonderfully flexible, but this flexibility can be the cause of much frustration, especially when broadcasters must decide on a network topology.

Audio For Broadcast: Cloud Based Audio

With several industry leading audio vendors demonstrating milestone product releases based on new technology at the 2024 NAB Show, the evolution of cloud-based audio took a significant step forward. In light of these developments the article below replaces previously published content…

Future Technologies: New Hardware Paradigms

As we continue our series of articles considering technologies of the near future and how they might transform how we think about broadcast, we consider the potential processing paradigm shift offered by GPU based processing.