Speech-to-text functionality can help you adhere to accessibility regulations by captioning your content.
More so than many other technology trends, artificial intelligence (AI) has quickly become one of the main buzzwords in broadcast and media production. As a result, marketeers in every area of the industry are now presenting products that integrate AI to deliver better production workflows. Increasingly though, the definition of AI has become somewhat cloudy – many automation-centric tools are being mis-labeled as intelligent. With automation being such a key technology within media and content production, many vendors are now mixing definitions of the two to better market applications while AI is such a talking point. So, before we start labeling everything AI-enabled, let’s first look into defining exactly what artificial intelligence is.
Rebecca Lindley, product marketing at IPV.
The possibilities of AI
To be clear – this isn’t an attack on the lineup of AI-integrated products or concepts that can be seen on the trade show floor. Being able to integrate intelligence into video production is potentially one of the biggest advances the industry will have made in a long time.
We’re constantly hearing that broadcasters and content producers are under pressure to create more with less. This is the case whether aiming to improve the overall quality of programming to create a more engaging product or upping the quantity to keep up with the latest viewing platforms and increasing viewer expectations. To do this, resources need to be as streamlined as possible, operating as efficiently as they can in order to output what’s needed.
Artificial intelligence can be an answer to this quandary. If a production workflow can make decisions for itself then it has the potential to supplement resources for content producers, creating the efficient workflow that’s needed in today’s incredibly diverse media production market.
Speech to text for intelligent subtitling
More video content is being produced than ever – whether for broadcast, OTT or VoD delivery or short-form online distribution – so content creators need to make sure their output is accessible to everyone. In many cases, this means including subtitles or closed captions. But as the amount of available video increases, the cost of resources needed to add these elements manually will grow exponentially.
This is an aspect of post-production that AI can really have an impact on. By integrating an intelligent machine into the editing process, workflow management tools can listen to the audio and transcribe the speech into subtitles which it adds to the workflow itself. For added flexibility, subtitles can be burnt-in to the image or saved as a subtitle file for inclusion with online video players.
Not only can speech to text functionality help save resources but also ensure content producers’ media libraries are much more accessible. In certain geographies this application can help some organizations conform to legislation.
Fusing different types of AI can be the best way of reliably tagging content. The red areas show where all AI sources have identified similar terms, meaning that it’s more likely to be correct
Image recognition for automated metadata
In a similar vein, one of the potential uses for AI is for image recognition to make content identification and metadata tagging more efficient. When ingesting assets, adding content tags is commonplace (and if not, it should be). Tagging media with the correct pre-determined descriptors creates a much better search and discover workflow, allowing editors to more easily find assets as needed during an edit.
With AI in place, this can be done by the ingest engine instead of by an operator. And if taught correctly, a machine can review media during ingest and highlight certain aspects of the content to which it can add a tag. For example, a broadcaster ingests a bank of assets from a music festival and the AI machine can identify the different musical artists in the footage and add their names to the metadata as required. This can be done to varying levels of detail – from the overall theme of an asset right down to identifying objects in every frame.
The overall benefit to users is the same as if this were done manually – faster search and discover processes. But the AI is able to perform the task more thoroughly, in finer detail – and much faster than if done by hand.
The possibilities for AI’s usefulness potentially peaks in the live production space, where speed is perhaps the most important aspect of a production workflow. Walking around recent trade shows, you can see several live applications integrated with AI for things like camera operation or program direction.
As well as these acquisition-centric applications, an intelligent machine can be taught to recognize particular actions as they happen. This is something that could easily save resources during live operations where users need to clip replays and highlights within their media asset management system as content is being ingested.
For example, a particular sports-orientated user of IPV Curator creates a clip every ten seconds during the ingest of live NBA games. If an AI machine is taught to do the same, manual resources could be saved. A machine can recognize actions within the live stream of a basketball game – like a slam dunk, a foul or a free-throw – which it could intelligently clip, saving replays without anyone having to manually mark in and outpoints.
An automated metadata application could also be used, creating metadata-rich clip ready for use in live edits or online distribution.
Events like dunks in basketball could eventually become recognisable by AI, meaning that logging could be at least partially automated.
Let’s get definitive – AI v automation
Within these examples, there’s no doubt that AI is responsible for output that previously wouldn’t have been possible. But even so, there’s still a lingering confusion between what’s defined as AI and what is in fact automation.
There are now media export tools labeled with AI-integration which create multiple versions of an asset, each in a different file format, compression protocol or aspect ratio. This, however isn’t necessarily artificial intelligence. A simple script can be written which tells the output what format an asset should be in, depending on the platform it will be delivered to. In this instance, the workflow is using a process-centric tool that’s been designed to execute a set of rules. There’s nothing ‘intelligent’ about it.
But that’s not to say it’s without merit. This kind of automation is hugely beneficial in many workflows – and saves resources for content creators everywhere. In order to define something as intelligent, the technology must be able to do more. It must have the ability to make a decision that a human will have traditionally needed to make.
If a production tool receives an input, makes a change to it and outputs something pre-defined, it’s automated. If the same tool is able to review a piece of content, decide what should be done to it and then output something, then it’s intelligent.
For automated subtitling, the AI decides what words are being spoken and replicates them in a caption format. For automating metadata, a computer chooses what is present in a piece of video content and adds information as a tag. If deploying event recognition, the machine is deciding what within a livestream is an important event and selects that action, saving it as a ready-to-use asset.
There’s an argument that vendors shouldn’t employ the word automated to describe actions undertaken by AI in order to avoid confusion. But while there is a distinct difference, in the technology behind the two things, they do work hand in hand because AI is taking automation to the next level. With artificial intelligence, we’re able to create automated processes that go far beyond the rule-following scripts and save time while avoiding mistakes.
The ability to make judgements for itself is what makes AI such an intriguing prospect for media production. But as with any industry-wide technology trend, we need to make sure we’re cutting through the marketing speak to learn about how AI can really be used to address content creators’ real-world challenges.
You might also like...
TDM Mesh Networks: A Simple Alternative To Leaf-Spine ST2110. Pt1 - Balancing Technical Requirements
IP is well known and appreciated for its flexibility, scalability, and resilience. But there are times when the learning curve and installation challenges a complete ST-2110 infrastructure provides are just too great.
We live in fascinating times: increasingly, we live in the era of cloud-based broadcast operations.
Moving to IP is allowing broadcasters to explore new working practices and mindsets. Esports has grown from IT disciplines and is moving to broadcast and has the potential to show new methods of working.
Building optimized systems that scale to meet peak demand delivers broadcast facilities that are orders of magnitude more efficient than their static predecessors. In part 2 of this series, we investigate how this can be achieved.
IP is delivering unprecedented flexibility and scalability for broadcasters. But there is a price to pay for these benefits, namely, the complexity of the system increases significantly as we add more video and audio over IP.