Machine Learning (ML) For Broadcasters: Part 9 - Automating Workflows

Machine Learning is making an impact across all aspects of video workflow, especially personalization, QC, compliance and moderation.

Almost all aspects of workflow automation from content conception to final consumption and archiving are being transformed by Machine Learning (ML), along with some other techniques under the AI banner. The process is ongoing, indeed only just getting up to steam for many broadcasters and video service providers, with the greatest benefits still to come.

Yet workflow automation is as old as broadcasting itself, having rolled along through various techniques such as robotic tape handling and rule-based archiving. Such processes required some manual assistance and supervision, but that applies today to many tasks where ML is starting to be applied.

Indeed, automation of ML model generation and associated data set creation is itself a field of research for automation. In other words, ML has scope for automating its own operation and maintenance.

Not all automation in video workflows requires ML, even today, and equally not all application of ML in workflow is for automation. ML comes into its own whenever there is a call for analysis of large data sets to recognize patterns or identify objects of some kind within them. ML algorithms can be very good at converging on subsets of data even when those cannot be defined by the user, and then recognizing them subsequently, under unsupervised learning.

There are then various tasks in video workflow that do not require ML for automation, being readily defined by rules that are then executed automatically. These rules may be driven by metadata and then combined with other external variables, such as date, time of day, or preferences of the user. This can be used for decisions around personalization, content delivery and advert targeting, which although not needing ML can be enriched by it, allowing finer grains of control.

Workflow can be regarded as the sequence of tasks involved as video proceeds through its life from concept to consumption, and the application of ML in some of these, such as encoding and delivery, has already been described in other articles in this series. But there are some areas, such as quality control (QC) that are involved right across the lifecycle and so can be regarded as integral to workflow.

QC can also be divided roughly in line with the communications stack, ranging from lower-level physical interconnections at the bottom to higher level content and application issues at the top. So far it is mostly the lower levels that have been automated through various tools without the help of ML. There are numerous live media analyzer software tools that perform quality checks at the signal level automatically, conforming to standards such as the ETSI TR 101 290 Priority 1 and 2 to ensure compatibility and interoperability.

There are also monitoring services from major cloud providers and others for both broadcast and streaming video that perform checks not just at the signal level but also at the content level, although until recently the latter has been a labor-intensive process that has also limited its scope. Now with growing need for more in depth checking of content in large amounts, and in near real time for live, automation of tasks such as checking for compliance or generation of relevant metadata is becoming essential. This is where ML is increasingly being applied, in services that analyze and verify the content of HTTP Live Streaming (HLS) video streams quickly within 15 seconds at an absolute maximum and preferably well under 10 seconds.

Compliance monitoring and moderation has become a particular challenge for live content and one where ML in conjunction with other AI tools is increasingly being applied. Many of these evolved in the online world for moderating UGC (User Generated Content) as it is uploaded, which does have the luxury of a little time. In such cases the automated systems may work in conjunction with human moderators who have the final say, passing over content identified as being possibly in breach of rules over aspects such as hate speech, obscenities, or being unsuitable in some way for its target audience.

Even for UGC and on demand content, human based moderation cannot scale economically to the volumes many service providers have to cope with today, while for live content there is not the time. Moderation has therefore become very fertile ground for ML, which can help sieve large and complex sets of UGC, with major providers of such services claiming already that at least 95% of the time previously taken moderating content manually can be recouped.

ML is well equipped to converge on agreed moderation norms, and there is scope for tuning models to suit varying regulations or audiences. ML models can be applied hierarchically, sifting content into different categories of moderation. There is also ability to combine analysis of audio, for verbal aspects of moderation, with text, graphics and video, all of which are capable of infringing sensibilities, regulations or rules of censorship. This is a controversial area, but that is not the fault of ML, although there is scope for algorithmic bias in decision making.

Algorithmic bias is certainly an issue for some ML applications, occurring as a result of model mis training leading to systematic errors in predictions or judgements, which were not intended, such as assigning some objects to the wrong category.

This is less of an issue perhaps for workflow automation, but another limitation of ML at present is in interpretating context, which is becoming increasingly desirable for categorizing video content and assigning metadata of value for targeting decisions. Some humans at least are good at judging when content is satirical, or when it indulges in conspiracy theories, but this is a hard problem at present for ML. It is a field of active research though and progress is being made, which would lead to further automation of higher level content categorization.

At a more general level, a major challenge for workflow automation is that to make the best use of ML, the whole process needs to be adapted and in some cases almost redesigned from the ground up. This is because ML is data driven and so workflow needs to be designed around coherent data sets conducive to analysis wherever possible.

To some extent workflow is evolving in this direction anyway, towards being more compartmentalized and served by smaller chunks of software delivered and maintained according to the microservices model featuring DevOps methods. DevOps is an abbreviated concatenation of development and operations, two stages of the software lifecycle that used to be separated into distinct silos that meant large chunks of code were delivered after testing for live use, which led to delays as bugs surfaced amid misunderstandings between users or operators and developers.

Under DevOps, code is developed and delivered in smaller components that are more easily understood and simpler to test independently. There is improved feedback between developers, operators and users, at least that is the theory. This also helps inject ML into workflow at various stages to automate at an increasingly granular level.

The idea is that any point in the workflow where there is a process that generates data and consumes time can be automated with the help of ML. There are also more advanced areas where ML can save time and also increase scope by harnessing computational power more effectively. Video rendering in animation, creation of special effects and gaming are areas where ML is starting to make an impact, intruding into areas previously thought to be the preserve of human creatives. There is a project, for example, at one long established TV technology vendor to transfer live-action footage of actors or people with definable facial expressions into animation systems.

An extension of this is in generating genuine 3D representations from 2D video, which is complex and involves computationally heavy analysis of successive video frames as objects move and rotate between them. Such a process cannot be defined by rules but can be achieved with ML given enough computational power.

Such applications will be incorporated within future workflows. For now though there is plenty of work automating tasks around QC and compliance across the workflow with the help of ML, as well as for liberating value from content.

The latter can be achieved by making archived content more searchable as has already been done to varying extents by major broadcasters such as the BBC, although such projects are still ongoing. ML is also being applied for transforming content into different versions, such as extracting clips or highlights from recordings of sporting events in near real time. ML has been used in this way for several years but is continually being refined.

As one example, the annual Wimbledon Lawn Tennis Championships in the UK first applied ML at the 2018 event to automate tagging and assembly of two-minute highlight reels for online publication. This rated each play on metrics such as crowd noise and player gesture to make it easier and faster for human editors to build more extensive highlights. By the most recent 2022 championships that had been refined to allow users to access personalized highlights based on players they were following, among other enhancements.

Such examples highlight how ML is entering the workflow in different places, often at first in conjunction with human operators, delivering savings as well as adding value for users. It is fast becoming an integral part of the workflow value chain essential for competitiveness both above and below the line.

You might also like...

Microphones: Part 2 - Design Principles

Successful microphones have been built working on a number of different principles. Those ideas will be looked at here.

Expanding Display Capabilities And The Quest For HDR & WCG

Broadcast image production is intrinsically linked to consumer displays and their capacity to reproduce High Dynamic Range and a Wide Color Gamut.

Standards: Part 20 - ST 2110-4x Metadata Standards

Our series continues with Metadata. It is the glue that connects all your media assets to each other and steers your workflow. You cannot find content in the library or manage your creative processes without it. Metadata can also control…

Delivering Intelligent Multicast Networks - Part 2

The second half of our exploration of how bandwidth aware infrastructure can improve data throughput, reduce latency and reduce the risk of congestion in IP networks.

If It Ain’t Broke Still Fix It: Part 1 - Reliability

IP is an enabling technology which provides access to the massive compute and GPU resource available both on- and off-prem. However, the old broadcasting adage: if it ain’t broke don’t fix it, is no longer relevant, and potentially hig…