Machine Learning (ML) For Broadcasters: Part 7 - ML Automates & Extends Content Management
Machine learning and other aspects of AI are being used increasingly for various aspects of content management, including classification of assets and generation of metadata as the basis for future actions. ML is also being used on live content, especially user generated, for compliance and enforcing content moderation rules.
Other articles in this series:
Content management has traditionally been labor intensive and therefore expensive as well as restricted in scope. Those constraints have become more visible as content has proliferated in the streaming era, but are being mitigated by the rise of AI, especially Machine Learning (ML), which is rapidly becoming essential for automating at least partially many of the associated tasks around classification and associated metadata generation.
As well as cutting costs, this is helping liberate archives and maximize the assets’ value through more effective targeting and recommendation, for example. Yet this is still very much work in progress, and the full scope of AI and ML has yet to be exploited by even the most advanced broadcasters and content creators in the field.
It is worth emphasizing that the categories of AI and ML application we have identified for this series do overlap and that is especially true for content management, which includes MAM (Media Asset Management) as a subcategory dealing with the direct administration of content libraries. Content management in its larger sense deals with aspects relating to those assets throughout their lifecycle including certain enhancements such as addition of metadata. In this sense the lines between content management and other categories such as production are rather blurred, and as a result some of the applications of AI and ML defy easy categorization.
As one example, ML is used increasingly for creation of sporting highlights packages quickly after the event, as during the recent FIFA World Cup. The role of ML here is to identify actions on the field of play likely to be of interest, in the case of football including those involving skill, attempts on goal or major incidents on the pitch. This then comes under the headings of both content management and production.
Content management is also notable for employing both the latest ML techniques and some traditional rule-based AI methods, sometimes combining them for a given task. Today machine learning based on various forms of neural network comprising hierarchies of nodes carrying weights that are tuned or “learnt” for specific predictive tasks, represents by far the most common form of AI. Indeed, they have become almost synonymous.
Yet before continuing advances in computational power helped bring on the era of ML, the AI field had been treading water for around two decades, under various names such as expert systems, rule-based AI, or symbolic AI. Many applications did indeed involve application of rules and identification of data objects such as images or video frames by symbols or objects contained within them.
This older field too has enjoyed a new lease of life alongside ML, and for content management has been applied in metadata generation for identifying objects in audio, such as certain keywords or phrases, and in video, such as objects like balls. It can then be extended in the time dimension across frames to identify actions such as waving, running or dancing. It can then hook up to ML to “learn” sequences associated with those actions.
So good old symbolic AI, applied to speech and image recognition, can be used to extract basic metadata information from any content in principle, such as key words, phrases and image objects. Machine learning then enables deeper more meaningful classification into content groups that can then be applied in recommendation and targeting. This could involve feedback from the distribution loop or even from social media about how popular the content has been with different demographic groups. The popularity of content can be diced by gender, age group, geography, and even by factors relating to individual users such as other known preferences. This information can then be used for targeting and recommendation.
For such video classification, content is sifted into various classes, on the basis of actions, movements, specific objects contained within it such as a given actor, or features extracted from metadata like genre. The ML model is fed video frames as input and the output is then the probability of each class being represented in the video. This could be several classes for some content, just one for others, and none for some.
Having worked on frames in isolation the model can then consider the spatio-temporal relationships between adjacent frames to identify those actions that cannot readily be divined from a single frame. That can then lead to a stronger association between the video and the various classes, the aim being to assign either high probabilities close to 100% of belonging to a given class, or low ones near 0%. In some cases, human inspection may be required to resolve uncertainty where the ML model is unable to assign a high enough probability of given video belonging to a particular class.
Quality control (QC) is also fertile ground for both ML and symbolic AI, having traditionally been a labor-intensive process that therefore had to be applied sparingly, with also limited scope for live content. This comes under two headings, firstly basic technical QC such as involving checks for compatibility with client devices and for anomalies that impact the viewing experience.
Then there is more subjective QC for assessing the higher-level experience, as might be expressed by MOS (Mean Opinion Scores). Even the first category of QC task was once performed manually but is ripe for symbolic AI because it involves application of rules to identify technical anomalies that can be extracted automatically from the content. The second more subjective QC is where ML comes in by matching content with training material assessed by humans and assigning quality scores accordingly.
UK commercial broadcaster ITV has applied AI to automate the formerly labour-intensive process of marking segments such as color bars in content.
Compliance, which can be regarded as an extension of QC, is also suitable for ML, especially the variant known as supervised learning where the model converges around specific data combinations defined by the user. It involves labelling data sets and training the model to classify outcomes that match these quite closely. Content compliance can require identification of specific scenes or events in video that might fall foul of regulations in a particular region, individual country, or even target audience segment. ML can identify the scenes and tag them such that they can be snipped out for those territories where they might cause offence.
Somewhat related to this is the field of segment marking, which has long been employed by broadcasters and content creators for various purposes, including cataloging and content repurposing. UK commercial broadcaster ITV has employed ML to mark 12 segment types in its content, including functional elements such as color bars, or the slates containing descriptions of content. It also includes more applied or creative segments like recaps, credits, program part segments, and break bumpers.
The break bumper, or just a bump, is usually a two to fifteen seconds voice over between a pause in the program and its commercial break, or vice versa, of use for ad insertion and search. Now the model recognizes segments with well over 95% accuracy, approaching 100% in some cases.
USA cable and media giant Comcast, which owns Sky and NBC Universal, is one of the few to go so far as to commercialize an internally developed AI application, although the company has form here through having been the original architect of the RDK operating platform. RDK has been adopted widely by major cable TV operators, especially in North America, as an alternative to Android TV.
Comcast is one of the few large broadcasters and media groups not just to develop its own AI-based video analysis and classification system but also to make it available as a service.
Comcast developed Video Artificial Intelligence (VideoAI) for generation of actionable metadata around content assets, to help manage new content, improve advertising efficiency, and streamline workflow generally. This was adopted by NBCUniversal and Sky, as well as Comcast’s own pay TV service in the USA, with use extending to tagging key onscreen moments such as hard cuts, black frames, and transitions, then being recast for marketing as a SaaS. It is along the same lines as the ITV segment marking system, and although wider in scope does indicate that early developers of AI systems in the content management area have potential for recouping investments by making them available, unless they see competitive advantage keeping the innovations to themselves.
You might also like...
Microphones: Part 2 - Design Principles
Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.
Expanding Display Capabilities And The Quest For HDR & WCG
Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.
Standards: Part 20 - ST 2110-4x Metadata Standards
Our series continues with Metadata. It is the glue that connects all your media assets to each other and steers your workflow. You cannot find content in the library or manage your creative processes without it. Metadata can also control…
Delivering Intelligent Multicast Networks - Part 2
The second half of our exploration of how bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.
If It Ain’t Broke Still Fix It: Part 1 - Reliability
IP is an enabling technology which provides access to the massive compute and GPU resource available both on- and off-prem. However, the old broadcasting adage: if it ain’t broke don’t fix it, is no longer relevant, and potentially hig…