By mandating Audio Description in addition to traditional captioning, the FCC is making video programming more accessible.
On October 27, 2020 The Federal Communications Commission issued an order to expand its captioning mandate for broadcasters to include audio description requirements for 40 designated market areas (DMAs) over the next four years. The move came after the Twenty-First Century Communications and Video Accessibility Act of 2010 (CVAA) directed stations in the top 60 DMAs to provide what it calls “described programming.”
This means broadcasters and program creators in the top 100 markets in the U.S. must now include both visual and audio captioning or risk getting fined.
Audio description makes video programming more accessible by inserting narrated descriptions of a television program’s key visual elements during natural pauses in the program’s dialogue. The action ensures that a greater number of individuals who are blind or visually impaired can be better connected, informed, and entertained by television programming.
The FCC’s expanded rules will ensure that audio description will become more widely available across the broadcast and OTT markets, and that’s good news for everyone involved. Taking it a step further, the Commission has said that in 2023 it will determine whether to continue expanding audio description requirements to an additional 10 DMAs per year beyond the top 100 DMAs.
Traditionally, captioning has been accomplished with a certified captioner working on a dedicated computer but that’s changing. More recently, special software and the cloud has enabled the automated transcription and conversion of the spoken word into visual text, which is typically displayed at the lower third of the screen.
Automated Captioning Can Be Tricky
Captioning services can be set up to output different languages, with English and Spanish currently the most popular choices. However, it gets tricky when you have a guest speaking in a different language and your Automatic Speech Recognition (ASR) engine isn’t expecting it. This can cause some issues with accuracy during the captioning of live events. It may be due to the fact that the ASR engine and database isn’t mature enough to fully understand the words being spoken. For example, there have been issues where English-language ASR engines try to turn Spanish speech into English text, and the results are often horrible.
OTT providers like Netflix are now offering audio description features for many of its most popular titles.
Most automated captioning systems used today leverage “off-the-shelf” ASR engines from major providers like Amazon, Google, Speechmatics and a number of others. These large companies have the R&D teams required to develop these highly sophisticated ASR engines, so it is probably not a good idea to try and reinvent the wheel but pick the best technology for the job at hand. Many caption providers run quarterly evaluations of these technologies, focusing specifically on each technology’s performance for captioning workflows.
There’s More To It Than Just The Cloud
Legacy services like captioning of pre-recorded material, audio description, and sign language translation—delivered by human-generated captioning services—are now making the move to the cloud, but it has not been a total transformation. Cloud-based workflows are great for most applications, like live captioning, but for many people, a human captioner is familiar, reliable and less expensive than a fully automated system. Newer services like audio description and sign language translation, for the most part, are not cloud based at this point, but this will change with time and familiarity with this unique type of captioning.
Speed Vs. Accuracy
It should be noted that clients face a tradeoff between caption speed and accuracy that when setting up their systems and they can adjust accordingly for each title or group of programs. While they can control the speed of caption transmission, based on the client’s requirements, its sometimes necessary for broadcasters to make compromises regarding the trade-off between speed and accuracy. With a real-time automated system, on average there is about 3-7 seconds of latency during a live event. That’s comparable to what a human captioner can do. If you make the processing go faster, the accuracy goes down, because the system has less time and context to figure out what’s being said in the audio. If you make the time longer, it will be more accurate.
News organizations are increasingly turning to automated live captioning systems to speed the process and ensure accuracy.
Cloud-Based Workflows Are The Future
The past year has seen a huge migration into cloud-hosted captioning workflows, simply because it makes economic sense when having to process hundreds of new titles at a time. Netflix, for one, has migrated to an automated system it developed in-house. A considerable amount of internal research has also gone into the timing of the text to ensure readability. The OTT provider now also offers Audio Description captions as well on many of its titles.
Due to looming financial and content demand pressures, large broadcasters have begun to understand they need automation to manage the ever increasing amount of material that needs to be captioned. The scalability of the cloud ensures that broadcasters only pay for cloud connection and processing costs when they actually need it and can turn off the services when they don’t. In addition, cloud-based captioning services with large databases of keywords and phrases have set up thousands of parameters that have been built up over a decade or more of captioning to maintain a high degree of accuracy.
Even some of the smaller TV stations are happy using an automatic caption system instead of the traditional human captioning model because over time the savings are so large and the accuracy is often as good as or better than what they were getting before. There might be a situation where a human captioner would work best for a special event or when addressing a specific foreign language audience, but artificial intelligence and automation is catching up rapidly and the differences are getting smaller every day.
You might also like...
In the last article in this series, we looked at how PTP V2.1 has improved security. In this part, we investigate how robustness and monitoring is further improved to provide resilient and accurate network timing.
NAB have announced the show scheduled for October 2021 has been cancelled.
Timing accuracy has been a fundamental component of broadcast infrastructures for as long as we’ve transmitted television pictures and sound. The time invariant nature of frame sampling still requires us to provide timing references with sub microsecond accuracy.
For the past year an international group of technology companies, funded by the European Union (EU), has been looking into the use of 5G technology to streamline live and studio production in the hopes of distributing more content to (and…
Internet Service Providers (ISPs) are experiencing significant growth in bandwidth consumption largely due to the uptake of OTT video services and the growth in numbers of connected devices per household. ISPs are therefore navigating the path of making successful investments…