AI has helped production professionals do their job better and streamlined workflows, saving hours of time and effort in a myriad of ways.
There was a time when the mere mention of bringing artificial intelligence (AI) and machine learning into the media industry brought visions of robots replacing humans. Today that is certainly not the case—although we might be getting close: I saw a robotic camera operator move the cameras for a national television news show from his converted kitchen table. On-air, viewers never saw a difference from the programs they always watch.
After an initial reluctance to use the technology, it turns out that AI has actually helped production professionals do their job better and streamlined workflows, saving hours of time and effort in a myriad of ways. It’s now part of many broadcasters’ toolbox of production options. Many newsrooms now see these fast-computing algorithms as the best production assistant they have ever worked with. The results have been extremely positive: single images stored on massive petabyte storage systems are found in seconds; programs are played off of a server at precise times. It’s led to more content—for TV and the web when broadcasters need it most.
Looking at it operationally, AI solutions are simplifying and enhancing virtually everything in a TV newsroom’s workflow, while cutting the costs of production, speeding up content syndication and significantly reducing the amount of man hours required for some of the most labor-intensive tasks. Here are just a few examples of how AI is affecting the video production community specifically, in helpful ways.
Cloud-Based AI Services
Virtually every Cloud service provider now offers a suite of AI tools that can be selected on a per-usage basis for everything from live production switched remotely to on-demand storage and processing power. For large-scale, multi-venue productions, like the Olympics or a World Cup, the cloud has become invaluable as a scalable platform with unlimited resources. Latency remains a challenge with live telecasts, but this is improving with every new project completed.
Advances in AI are providing ways for broadcasters to offer greater coverage of live events, remotely. AI-based technology has improved the accuracy of robotic cameras tracking talent as they move across the set. A company called Telemetrics (Allendale, NJ) has recently added Ultra-Wide Band sensing, using AI software they developed, as a new fourth layer of camera tracking capability (after automated facial, object and preset functionality). They call it reFrameAutomatic Shot Correction technology. It basically trims a shot, without touching the control panel, to ensure on-screen talent is always where you want them to be in the frame. It’s particularly helpful when trying to record speakers that fidget.
AI-based technology has improved the accuracy of robotic cameras tracking talent as they move across the set.
Similar AI algorithms used is Ross Video's Vision-AI-ry robotic camera tracking software—it can recognize and distinguish between the talent’s race, gender, and age, and even distinguish between different faces (as long as at least 50 percent of that person is visible in the image). That’s not easy for a robotic camera head.
Therefore, the use of AI and machine learning to automate live production tasks offers an opportunity to bring new content to viewers that would otherwise be prohibitively expensive to produce.
Storage And Clip Search
AI is helping to speed up audio and clip searches across petabytes of data by using face detection, object recognition voice-to-text transcription and optical character recognition and other attributes. Using parallel processing and specialized algorithms designed for other industries (like security and the military), once content is cognitively processed and indexed searching and finding an image across vast libraries takes seconds.
A Mesa, Calif.-based company called Veritone has developed a suite of cloud-based software tools it calls aiWARE that uses a proprietary, machine-learning orchestration layer (“Conductor”). Serving as a search engine aggregator, the software not only employs multiple AI engines at once, but it also chooses the best-available engine or engines spread out across the globe to complete desired tasks.
The combination of Veritone’s and Quantum’s technologies allows users to apply AI to on-premise-stored content that previously could not be leveraged for this purpose and to add new content for analysis as the data is captured.
For example, with natural language processing, aiWARE can predict the accuracy of each transcription engine in a system, based on the characteristics of the media being processed. Conductor then automatically selects the best engine to process that file. The software can also identify the best engine for each portion of a file, applying multiple engines when needed to fill accuracy gaps.
Veritone and Quantum (San Jose, Calif.) now offer a joint solution, “aiWARE for Xcellis” that leverages Quantum’s StorNext file system and its range of Xcellis storage solutions (cloud, LTO tape, SSD and HDD spinning disk). StorNext serves as a database to a storage library that can be searched quickly—although, as they say, the larger the library, the slower the search. The combination of the two allows users to apply AI to on-premise-stored content that previously could not be leveraged for this purpose and to add new content for analysis as the data is captured.
Media Asset Management
Beyond mere fast content retrieval, one of the most notable ways AI is reinventing media asset management is through speech-to-text conversion. With speech-to-text conversion, an artificially intelligent software system or platform can convert spoken word into text by scanning an audio feed and breaking what it discerns into words. This is important for broadcasters that have to comply with governmental mandates in an effort to make captions for every video program available to the public.
Primestream’s Workflow Suite and Xchange MAM system enables users to manage critical content through its ingest, live production, postproduction, and distribution workflows.
AI is particularly useful for analyzing a large sample of audio and video files while shortening the process of transcribing audio to text, saving time for both marketing and creative teams. This is important in the case of MAM operations, because the extracted text from audio and video files becomes searchable, allowing for even faster location of assets by searching using spoken words found in the audio or video files you are searching.
And the technology is now widely available. Users of Primestream’s MAM system, which features APIs integrated with AI algorithms, now have access to tools such as facial and object recognition, speech to text transcription, sentiment analysis, and more in a software-driven integrated platform. With it, users can automatically generate transcriptions from video content. Through integrations with AI engines, the company’s Xchange product creates an automated transcription workflow that replaces the time-consuming and inefficient process of sending content out to external transcription services.
Speaking of mandated captioning, the past year has seen a huge migration into cloud-hosted captioning workflows, simply because it makes economic sense when having to process hundreds of new titles at a time. Netflix, for one, has migrated to an automated system it developed in-house, making extensive use of AI to streamline the process of selecting and delivering a specific movie title. A considerable amount of internal research has also gone into the timing of the text to ensure readability. The OTT provider now also offers Audio Description captions as well on many of its titles.
Due to looming financial and content demand pressures, large broadcasters have begun to understand they need automation to manage the ever-increasing amount of material that needs to be captioned. The scalability of the cloud ensures that broadcasters only pay for cloud connection and processing costs when they actually need it and can turn off the services when they don’t. In addition, cloud-based captioning services with large databases of keywords and phrases have taught their database engines—which feature thousands of parameters that have been built up over a decade or more of captioning—to now maintain a high degree of accuracy. That’s time (and money) saved to do other things.
Improving Compression Efficiency
AI is also improving the delivery of content, via better compression. A group of international technology vendors and broadcasters is developing standards to improve video coding. Calling itself MPAI (Moving Picture, Audio and Data Coding by Artificial Intelligence,) they believe that machine learning can improve efficiency of the existing Enhanced Video Coding standard by about 25 percent.
Leveraging AI and machine learning, the MPAI hopes to improve coding efficiency beyond what current codecs like EVC can do.
The MPAI says it is focused on a horizontal hybrid approach that introduces AI-based algorithms combined with traditional video codecs by replacing one or more blocks of the traditional loop with machine learning-based blocks.
It’s clear that, far from taking people’s jobs, AI has far-reaching implications that could affect literally every part of the content production/delivery lifecycle. It’s increasing productivity and enabling staff to do more with the same resources.
Scanning the production landscape, AI doesn’t seem scary after all. Embrace it.
You might also like...
While the merits of 8K delivery is being debated by broadcasters around the world, some are moving forward with plans to deploy the high resolution quality in creative ways that engage viewers and encourage them to interact with a live…
In the last article in this series, we looked at how PTP V2.1 has improved security. In this part, we investigate how robustness and monitoring is further improved to provide resilient and accurate network timing.
NDI (Network Device Interface) is a free protocol for Video over IP, developed by NewTek. The key word is “free.”
NAB have announced the show scheduled for October 2021 has been cancelled.
Timing accuracy has been a fundamental component of broadcast infrastructures for as long as we’ve transmitted television pictures and sound. The time invariant nature of frame sampling still requires us to provide timing references with sub microsecond accuracy.