Making Broadcast Captioning Accessible With ASR

From the devices we use to consume content, the type of content we consume, and indeed our viewing habits, almost every aspect of media and more importantly, our relationship with media, has undergone a significant change in the last 10 to 15 years. As an example, in the UK adults aged 16 and over watched, on average, 34 minutes less television in 2017 than in 2010 (according to Statista). While the rise of streaming services and on-demand content is undoubtedly a contributing factor, a bigger influence is almost certainly the rise in social media consumption.In fact, in 2015, more than 370 years of video - just shy of 200 million minutes - were watched on Twitter worldwide on a daily basis. As is the case with any industry, such rapid change brings with it new challenges, and the need for evolution and adaptation, and captioning is no exception.

Broadening our reach

From a technological perspective, captioning is as good as it’s ever been, but technological advancements across the board have led to an increase in people’s expectations; mobile phones are as capable as they’ve ever been; homes are now smart and connected; and cars are close to driving themselves. This mentality translates to all technologies, meaning that people’s expectations of captioning are higher than ever before.

This constantly requires the need for innovation and improvement to provide more efficient captioning to a high level of accuracy across the ever expanding broadcast and digital landscape.

Speaking of increases, not only is expectation and demand growing, but the amount of produced content requiring captions, and the amount of content that’s consumed is higher than ever before, meaning that the focus should be firmly placed on the audience.

While regulated content will continue to require skilled captioners to meet regulatory demands, there is a wealth of content within unregulated markets that could benefit from technological advancements to reach a wider pool of people.

Browsing through your average Twitter, LinkedIn or Facebook feed reveals that the presence of captioned content can be hit and miss. It is simply too expensive to use the same methods for this media used to meet regulatory demands. This means that the significant majority of SMEs lack the resources available to caption their content across their social channels, negatively impacting their engagement with their users.

According to multiple publishers (DigiDayup to 85% of all videos on Facebook are watched without sound; when you consider that over 500 million people watch video on Facebook every single day (via Forbes) there is an incomprehensibly large audience that is currently not being tailored to.

This becomes an even bigger problem when viewed through the lens of accessibility. With over 900 million people (or one in ten) estimated to suffer from disabling hearing loss by 2050, it is now more important than ever for all producers of video content to make sure accessibility remains at the forefront of their mind.

The challenge therefore lies not only in accurate captioning but captioning the sheer amount of content that’s uploaded across various social medias every minute. It’s imperative that we as an industry continue to add the necessary tools to be able to make more content available with better quality captioning across all media platforms - but how?

It goes without saying that the logistics of producing, managing and delivering those captions is a challenge of substantial complexity. A possible solution to the problem is in the form of Automatic Speech Recognition (ASR).

Huge advances in recent years have allowed the technology to be a realistic player in captioning for the very first time. It means that it can be used in our workflows to drive up productivity and allows us to cover more than ever before, more quickly, and more effectively.

At Red Bee Media, introducing the element of ASR has allowed us to create a solution that makes the production of accurate captions more cost-efficient. Using Speechmatics’ highly accurate ASR technology, we are able to apply captions to video content for online and social more efficiently and more accurately than before. This has also enabled us to transcribe thousands of hours of video content to make it easily searchable in a secure environment.

This is not only applicable to new content either; we can also pull from existing video content that needs repurposing, and it is significantly more efficient and easier to execute.


Captioning is at an inflection point; new technologies, new media, and new problems have created the perfect storm of issues, but also new opportunities. The industry is ripe for embracing new technological solutions, widening its reach to allow for the creation of better, more efficient captioning for the broadest possible audience.

Tom Wootton is product manager at Red Bee Media, part of Ericsson.

You might also like...

FCC Expands Caption Mandate As Automated Processing Takes Center Stage

On October 27, 2020 The Federal Communications Commission issued an order to expand its captioning mandate for broadcasters to include audio description requirements for 40 designated market areas (DMAs) over the next four years. The move came after the Twenty-First Century Communications and…

The Resurrection Of Live Linear TV And How To Playout From The Cloud

One of the surprises from the latest research published by Nielsen was the significant rise in audiences watching live linear TV. Lockdown has not only sent SVOD viewing soaring through the roof but linear TV is expanding rapidly. One reason…

At Facebook, AI Tackles Automated Captioning For Online Video

Online video captioning is critical for the deaf community at any time, but during a public health emergency like COVID-19, it has taken on a new significance, particularly as people stay at home.

Articles You May Have Missed – August 15, 2018

The standards for moving video over IP are all decided, right? Not yet. Even so, the innovation presents unprecedented opportunities and empowers broadcasters to deliver flexibility, scalability, and more efficient workflows. Consultant and The Broadcast Bridge technology editor, Tony Orme,…

Automating Titling and Graphics Creation for Multiplatform Distribution

Aesthetically pleasing 3D titles and graphics are integral to providing the wow factor that keeps today’s broadcast viewers glued to the screen. These visual elements—including 3D and 2D titles, animated graphics and real-time data-driven overlays—provide the vital conte…