Machine Learning (ML) For Broadcasters: Part 10 - Automating Workflows For Compliance & Moderation

Machine learning and some other AI techniques are increasingly being applied for many aspects of media content compliance and moderation, ranging from technical compliance for parameters such as loudness, to vetting for hate speech.

Other articles in this series:

Content moderation and broadcast compliance used to be background tasks that rarely broke through into public consciousness, but that all changed with the advent of digital social media, which has dragged video service providers of all forms along with it. The proliferation of live streaming combined with increasing scrutiny on broadcasters and content providers has created a perfect storm from which safety can only be reached with the help of heavy computation guided by machine learning and also some other departments of AI involved in natural language and video processing.

The field of compliance itself has expanded in scope and now includes technical aspects such as loudness and color matching. It also embraces adherence to an increasing array of government regulations relating to targeting and data privacy in particular, as well as the pressing need to moderate content for hate, abuse, incitement to violence, sexual explicitness, profanity, fake news, and other aspects, some of which are more subjective than others.

It is not the job of AI and ML to adjudicate over subjective matters but they have also become the target of regulatory action themselves over issues such as algorithmic bias, which has been highlighted by several celebrated cases involving the technology and social media giants. Further pressure has come from legislation such as the UK’s Online Safety Bill, which has drawn a lot of flak for jeopardizing freedom of speech, but will in whatever form it is implemented ratchet up the compliance burden for broadcasters further.

ML has been moving up the food chain of content compliance just as in other sectors, meaning that it is intruding more into areas of judgement previously the exclusive preserve of humans. At the level of technical compliance there is little controversy and the challenges are in achieving ever higher levels of accuracy and consistency. Some of the technical challenges have proved elusive in the past, as in ensuring consistency in audio reproduction through loudness control.

Loudness has become an even greater challenge in the streaming era, having long been a specialist area for broadcasters, involving established measurement tools and instruments. Now there is an ever greater array of devices and acoustic environments where content is played back. One aim is to avoid sudden variations in audio volume either within content or between separate items in a stream, such as when an ad starts playing.

Humans are still required for normalization, where the aim is to set levels high enough to overcome ambient noise from the electronics but not so high that they cause audio clipping where some of the waveform is cut off, degrading quality. ML can play a role by more intelligently matching levels to variations in the audio content, which can range from whispering to shouting, where traditional automated metering can go wrong.

While this is work in progress with disagreements at the technical level, there is no fundamental conflict over objectives. Higher up the food chain comes content moderation, which opens various cans of worms, as well as raising challenges at the cutting edge of ML application. This has really been driven by the explosion in User Generated Content (UGC) dating back to the emergence of social media platforms such as YouTube and Facebook almost two decades ago. This first affected these big platforms but has since embroiled broadcasters and various service providers, including former print newspapers as they have gone online and featured increasing amounts of video in news coverage.

The controversies over what should be moderated and what should not may rage on, but are less strictly relevant for AI and ML, whose role is ultimately to serve their masters, whether these are broadcasters or regulators. However, ML is directly in the firing line when it comes to implementation and the question of bias or inconsistency.

Various examples have been quoted of ML algorithms failing to moderate content adequately, singling out items for the wrong reasons, when they include words such as “liar” that might be libellous in certain circumstances, but do not by themselves denote say hate speech. One of the problems is that content moderation has to take account of local variations in dialect, linguistic usage, and culture, that might render material inappropriate in one area but not in another. Such distinctions are not confined to audio but also extend to video in the case for example of what might be deemed sexually explicit or unduly violent material. The application of ML for such video analysis is even more challenging.

Yet in principle deep learning has the potential to cope with nuances and regional distinctions through use of more diverse training sets that represent these variations. Then the models can converge on multiple points that enable them to distinguish between the different cases when analyzing content for moderation.

Therefore, absolute criticisms of ML for content moderation just because the models make a hash of some decisions today are premature. The technology will improve and its application in content moderation will mature, with the message now being not to place too much reliance on the models too soon, and to resist the hyperbole of the strongest AI advocates.

A fundamental point is that ability to moderate live content in real time has only come about as a result of the phenomenal advances in computational power achieved in large part through reductions in chip process size, especially over the last decade. The role of ML lies in focusing that power more flexibly and adaptably on the task in hand than is possible with conventional software engineering and rule based approaches.

In terms of broadcasting compliance and content moderation, ML already has great scope to automate content filtering and identify the more controversial material that still requires decisions by human specialists to determine whether to allow through.

Again, this leaves the grey area of live UGC content, with some of the big tech companies, including Apple and Amazon, moving towards what they call proactive content moderation. This is an odd term to use since it means simply making the decision over whether to block content before it is posted rather in response to a complaint afterwards. That is surely the point of moderation, but the controversy arises over that question of censorship.

That question has also arisen over the related question of video messaging, which overlaps with the UGC area and also with mobile news gathering. A key point here is that popular messaging apps such as Meta’s WhatsApp, Apple’s iMessage, and Signal from the eponymous company, all support strong encryption which they have claimed means that users can be confident no one will be able to read their messages, not even the service provider or app developer.

For example, Mark Zuckerberg, CEO of Facebook as it then was, insisted in 2018 at a testimony before the US Senate that “We don’t see any of the content in WhatsApp.”

This has obvious implications for moderation, since if Zuckerberg’s statement were totally correct it would allow any user to send any message they liked irrespective of the content contained. In principle it would allow illicit services to be established inside closed user groups, to which no government, regulator, or law enforcement agency would have access.

As a result, this insistence on absolute protection of messages from eavesdropping or interception has been watered down. There are techniques for identifying illicit activity without encrypting the content, for example through tracking routes taken and users involved. But the big platforms have also been looking at ways of identifying illicit content outside the encryption domain, notably through client side scanning with perceptual hashing.

Perceptual hashing is the process of deconstructing images into a series of indexes or fingerprints that can subsequently be used to identify almost identical examples, such as a particular type of child sexual abuse material (CSAM). A fingerprint would usually comprise a string of alphanumeric characters that could then be run up against future images to identify ones that infringe rules against distribution of CSAM sufficiently closely.

Apple started incorporating this hashing software in iPhones with the aim of identifying cases of CSAM infringement before they were encrypted for uploading, in a form of client side scanning. In the event of matches exceeding a certain threshold of “hash hits”, Apple would intervene to decrypt the images and possibly inform a law enforcement agency. Clearly this breaches the promise of confidentiality, while in theory exposing users to accusation of child abuse that might be false, when in fact they were merely taking photos of their own children in the bath say. ML can in principle be employed to increase the accuracy of such detection without even needing examples for matching in a database, but Apple is naturally reluctant to incur risk of false accusations and so does not exploit such potential yet.

While not directly relevant for many broadcasters, this example highlights the tensions that have to be negotiated striking a balance between confidentiality, freedom of expression and protection against content that can cause harm to individuals or groups. ML and AI can help implement this balance but not to determine where it should lie.

What is certain is that as broadcasters embrace mobile news gathering from non-professionals in many cases, who happen to be in the right place at the right time, as an extension of their remote operations, AI based content moderation will become essential. This will involve filtering and perhaps editing such content automatically to meet basic quality guidelines, as well as for regulatory compliance.

Inevitably, ML will extend even further up the chain, aiming to classify content by tone where it will overlap with metadata creation, personalization, and regionalization. There will be increased ability to combine audio, graphics, captions and video more effectively in such classification, with potential for targeting and revenue generation, as well as compliance.

AI and ML have been criticized for making decisions or taking actions that are too opaque, which the system is itself unable to explain because it cannot articulate the statistical regressions or other mechanisms involved. This field known as open AI, or explainable AI, is subject to research that will be highly relevant for content moderation, and perhaps justifying those many cases where people and not just media assets have been inexplicably sanctioned, or “cancelled” by major social media platforms.

Other related articles posted on The Broadcast Bridge.

Machine Learning (ML) For Broadcasters: Part 11 - Generative AI In Content Generation

You might also like...

Microphones: Part 11 - The State Of The Art… And The Potential Of MEMS Microphone Arrays

Here we look from the state of the art in microphones, to what the future may bring with the enticing theoretical potential of microphone arrays built using MEMS technology.

Microphones: Part 10 - Mid-Side (M-S) Recording And Processing

M-S techniques provide useful sound-field positioning and a convenient way to check mono compatibility. We explain the hard science behind this often misunderstood technique.

Microphones: Part 9 - The Science Of Stereo Capture & Reproduction

Here we look at the science of using a matched pair of microphones positioned as a coincident pair to capture stereo sound images.

Microphones: Part 8 - Audio Vectorscopes

The audio vectorscope is an excellent tool for assuring quality in stereo sound production, because it makes the virtual sound image visible in the same way that a television vectorscope allows the color signals to be seen.

Microphones: Part 7 - Microphones For Stereophony

Once the basic requirements for reproducing sound were in place, the most significant next step was to reproduce to some extent the spatial attributes of sound. Stereophony, using two channels, was the first successful system.