FCC Expands Caption Mandate As Automated Processing Takes Center Stage

On October 27, 2020 The Federal Communications Commission issued an order to expand its captioning mandate for broadcasters to include audio description requirements for 40 designated market areas (DMAs) over the next four years. The move came after the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA) directed stations in the top 60 DMAs to provide what it calls “described programming.”

This means broadcasters and program creators in the top 100 markets in the U.S. must now include both visual and audio captioning or risk getting fined.

Audio description makes video programming more accessible by inserting narrated descriptions of a television program’s key visual elements during natural pauses in the program’s dialogue. The action ensures that a greater number of individuals who are blind or visually impaired can be better connected, informed, and entertained by television programming.

The FCC’s expanded rules will ensure that audio description will become more widely available across the broadcast and OTT markets, and that’s good news for everyone involved. Taking it a step further, the Commission has said that in 2023 it will determine whether to continue expanding audio description requirements to an additional 10 DMAs per year beyond the top 100 DMAs.

Traditionally, captioning has been accomplished with a certified captioner working on a dedicated computer but that’s changing. More recently, special software and the cloud has enabled the automated transcription and conversion of the spoken word into visual text, which is typically displayed at the lower third of the screen.

Automated Captioning Can Be Tricky

Captioning services can be set up to output different languages, with English and Spanish currently the most popular choices. However, it gets tricky when you have a guest speaking in a different language and your Automatic Speech Recognition (ASR) engine isn’t expecting it. This can cause some issues with accuracy during the captioning of live events. It may be due to the fact that the ASR engine and database isn’t mature enough to fully understand the words being spoken. For example, there have been issues where English-language ASR engines try to turn Spanish speech into English text, and the results are often horrible.

OTT providers like Netflix are now offering audio description features for many of its most popular titles.

OTT providers like Netflix are now offering audio description features for many of its most popular titles.

Most automated captioning systems used today leverage “off-the-shelf” ASR engines from major providers like Amazon, Google, Speechmatics and a number of others. These large companies have the R&D teams required to develop these highly sophisticated ASR engines, so it is probably not a good idea to try and reinvent the wheel but pick the best technology for the job at hand. Many caption providers run quarterly evaluations of these technologies, focusing specifically on each technology’s performance for captioning workflows.

There’s More To It Than Just The Cloud

Legacy services like captioning of pre-recorded material, audio description, and sign language translation—delivered by human-generated captioning services—are now making the move to the cloud, but it has not been a total transformation. Cloud-based workflows are great for most applications, like live captioning, but for many people, a human captioner is familiar, reliable and less expensive than a fully automated system. Newer services like audio description and sign language translation, for the most part, are not cloud based at this point, but this will change with time and familiarity with this unique type of captioning.

Speed Vs. Accuracy

It should be noted that clients face a tradeoff between caption speed and accuracy that when setting up their systems and they can adjust accordingly for each title or group of programs. While they can control the speed of caption transmission, based on the client’s requirements, its sometimes necessary for broadcasters to make compromises regarding the trade-off between speed and accuracy. With a real-time automated system, on average there is about 3-7 seconds of latency during a live event. That’s comparable to what a human captioner can do. If you make the processing go faster, the accuracy goes down, because the system has less time and context to figure out what’s being said in the audio. If you make the time longer, it will be more accurate.

News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.

News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.

Cloud-Based Workflows Are The Future

The past year has seen a huge migration into cloud-hosted captioning workflows, simply because it makes economic sense when having to process hundreds of new titles at a time. Netflix, for one, has migrated to an automated system it developed in-house. A considerable amount of internal research has also gone into the timing of the text to ensure readability. The OTT provider now also offers Audio Description captions as well on many of its titles.

Due to looming financial and content demand pressures, large broadcasters have begun to understand they need automation to manage the ever increasing amount of material that needs to be captioned. The scalability of the cloud ensures that broadcasters only pay for cloud connection and processing costs when they actually need it and can turn off the services when they don’t. In addition, cloud-based captioning services with large databases of keywords and phrases have set up thousands of parameters that have been built up over a decade or more of captioning to maintain a high degree of accuracy.

Even some of the smaller TV stations are happy using an automatic caption system instead of the traditional human captioning model because over time the savings are so large and the accuracy is often as good as or better than what they were getting before. There might be a situation where a human captioner would work best for a special event or when addressing a specific foreign language audience, but artificial intelligence and automation is catching up rapidly and the differences are getting smaller every day.

You might also like...

ATSC 3.0: Right Place, Right Time

Many people and cultures celebrate special New Year dates. Organizations designate fiscal years. Broadcasters traditionally mark their new technology year mid-April, at annual NAB Shows. Old habits die hard.

Apple’s M1 ARM For Broadcast Infrastructure Applications: Part 2

In part 2 of this investigation, we look at why Apple’s new M1 processor benefits broadcasters.

The World Of OTT: Part 10 - Optimizing Encoding & Contribution For Live OTT

OTT has dramatically expanded the range of delivery outlets for content and continues to do so. This has had a direct effect on content production, enabling almost any organization or person to create and distribute live content, which has increased…

Is Gamma Still Needed?: Part 10 - Summary

In this final part of the series, an attempt will be made to summarize all that has gone before and to see what it means.

Timing: Part 1 - Sidereal Or Solar?

The subjects of timing, synchronizing and broadcasting are inseparable and in this new series John Watkinson will look at the fundamentals of timing, areas in which fundamental progress was made, how we got where we are and where we might…