By mandating Audio Description in addition to traditional captioning, the FCC is making video programming more accessible.
On October 27, 2020 The Federal Communications Commission issued an order to expand its captioning mandate for broadcasters to include audio description requirements for 40 designated market areas (DMAs) over the next four years. The move came after the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA) directed stations in the top 60 DMAs to provide what it calls “described programming.”
This means broadcasters and program creators in the top 100 markets in the U.S. must now include both visual and audio captioning or risk getting fined.
Audio description makes video programming more accessible by inserting narrated descriptions of a television program’s key visual elements during natural pauses in the program’s dialogue. The action ensures that a greater number of individuals who are blind or visually impaired can be better connected, informed, and entertained by television programming.
The FCC’s expanded rules will ensure that audio description will become more widely available across the broadcast and OTT markets, and that’s good news for everyone involved. Taking it a step further, the Commission has said that in 2023 it will determine whether to continue expanding audio description requirements to an additional 10 DMAs per year beyond the top 100 DMAs.
Traditionally, captioning has been accomplished with a certified captioner working on a dedicated computer but that’s changing. More recently, special software and the cloud has enabled the automated transcription and conversion of the spoken word into visual text, which is typically displayed at the lower third of the screen.
Automated Captioning Can Be Tricky
Captioning services can be set up to output different languages, with English and Spanish currently the most popular choices. However, it gets tricky when you have a guest speaking in a different language and your Automatic Speech Recognition (ASR) engine isn’t expecting it. This can cause some issues with accuracy during the captioning of live events. It may be due to the fact that the ASR engine and database isn’t mature enough to fully understand the words being spoken. For example, there have been issues where English-language ASR engines try to turn Spanish speech into English text, and the results are often horrible.
OTT providers like Netflix are now offering audio description features for many of its most popular titles.
Most automated captioning systems used today leverage “off-the-shelf” ASR engines from major providers like Amazon, Google, Speechmatics and a number of others. These large companies have the R&D teams required to develop these highly sophisticated ASR engines, so it is probably not a good idea to try and reinvent the wheel but pick the best technology for the job at hand. Many caption providers run quarterly evaluations of these technologies, focusing specifically on each technology’s performance for captioning workflows.
There’s More To It Than Just The Cloud
Legacy services like captioning of pre-recorded material, audio description, and sign language translation—delivered by human-generated captioning services—are now making the move to the cloud, but it has not been a total transformation. Cloud-based workflows are great for most applications, like live captioning, but for many people, a human captioner is familiar, reliable and less expensive than a fully automated system. Newer services like audio description and sign language translation, for the most part, are not cloud based at this point, but this will change with time and familiarity with this unique type of captioning.
Speed Vs. Accuracy
It should be noted that clients face a tradeoff between caption speed and accuracy that when setting up their systems and they can adjust accordingly for each title or group of programs. While they can control the speed of caption transmission, based on the client’s requirements, its sometimes necessary for broadcasters to make compromises regarding the trade-off between speed and accuracy. With a real-time automated system, on average there is about 3-7 seconds of latency during a live event. That’s comparable to what a human captioner can do. If you make the processing go faster, the accuracy goes down, because the system has less time and context to figure out what’s being said in the audio. If you make the time longer, it will be more accurate.
News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.
Cloud-Based Workflows Are The Future
The past year has seen a huge migration into cloud-hosted captioning workflows, simply because it makes economic sense when having to process hundreds of new titles at a time. Netflix, for one, has migrated to an automated system it developed in-house. A considerable amount of internal research has also gone into the timing of the text to ensure readability. The OTT provider now also offers Audio Description captions as well on many of its titles.
Due to looming financial and content demand pressures, large broadcasters have begun to understand they need automation to manage the ever increasing amount of material that needs to be captioned. The scalability of the cloud ensures that broadcasters only pay for cloud connection and processing costs when they actually need it and can turn off the services when they don’t. In addition, cloud-based captioning services with large databases of keywords and phrases have set up thousands of parameters that have been built up over a decade or more of captioning to maintain a high degree of accuracy.
Even some of the smaller TV stations are happy using an automatic caption system instead of the traditional human captioning model because over time the savings are so large and the accuracy is often as good as or better than what they were getting before. There might be a situation where a human captioner would work best for a special event or when addressing a specific foreign language audience, but artificial intelligence and automation is catching up rapidly and the differences are getting smaller every day.
You might also like...
The venerable field of audio/visual (AV) packaging is undergoing a renaissance in the streaming age, driven by convergence between broadcast and broadband, demand for greater flexibility, and delivery in multiple versions over wider geographical areas requiring different languages and…
One of the key trends at IBC 2022 is virtualization and moving to cloud-native infrastructures. Manufacturers and users want to improve workflow efficiencies with whole cloud ecosystems and data.
Many annual NAB Shows have become milestones in TV broadcasting history. The presence of the 2022 NAB Show marked the first Las Vegas NAB Show since 2019.
One of the surprises from the latest research published by Nielsen was the significant rise in audiences watching live linear TV. Lockdown has not only sent SVOD viewing soaring through the roof but linear TV is expanding rapidly. One reason…
Online video captioning is critical for the deaf community at any time, but during a public health emergency like COVID-19, it has taken on a new significance, particularly as people stay at home.