Making Broadcast Captioning Accessible With ASR

From the devices we use to consume content, the type of content we consume, and indeed our viewing habits, almost every aspect of media and more importantly, our relationship with media, has undergone a significant change in the last 10 to 15 years. As an example, in the UK adults aged 16 and over watched, on average, 34 minutes less television in 2017 than in 2010 (according to Statista). While the rise of streaming services and on-demand content is undoubtedly a contributing factor, a bigger influence is almost certainly the rise in social media consumption.In fact, in 2015, more than 370 years of video - just shy of 200 million minutes - were watched on Twitter worldwide on a daily basis. As is the case with any industry, such rapid change brings with it new challenges, and the need for evolution and adaptation, and captioning is no exception.

Broadening our reach

From a technological perspective, captioning is as good as it’s ever been, but technological advancements across the board have led to an increase in people’s expectations; mobile phones are as capable as they’ve ever been; homes are now smart and connected; and cars are close to driving themselves. This mentality translates to all technologies, meaning that people’s expectations of captioning are higher than ever before.

This constantly requires the need for innovation and improvement to provide more efficient captioning to a high level of accuracy across the ever expanding broadcast and digital landscape.

Speaking of increases, not only is expectation and demand growing, but the amount of produced content requiring captions, and the amount of content that’s consumed is higher than ever before, meaning that the focus should be firmly placed on the audience.

While regulated content will continue to require skilled captioners to meet regulatory demands, there is a wealth of content within unregulated markets that could benefit from technological advancements to reach a wider pool of people.

Browsing through your average Twitter, LinkedIn or Facebook feed reveals that the presence of captioned content can be hit and miss. It is simply too expensive to use the same methods for this media used to meet regulatory demands. This means that the significant majority of SMEs lack the resources available to caption their content across their social channels, negatively impacting their engagement with their users.

According to multiple publishers (DigiDayup to 85% of all videos on Facebook are watched without sound; when you consider that over 500 million people watch video on Facebook every single day (via Forbes) there is an incomprehensibly large audience that is currently not being tailored to.

This becomes an even bigger problem when viewed through the lens of accessibility. With over 900 million people (or one in ten) estimated to suffer from disabling hearing loss by 2050, it is now more important than ever for all producers of video content to make sure accessibility remains at the forefront of their mind.

The challenge therefore lies not only in accurate captioning but captioning the sheer amount of content that’s uploaded across various social medias every minute. It’s imperative that we as an industry continue to add the necessary tools to be able to make more content available with better quality captioning across all media platforms - but how?

It goes without saying that the logistics of producing, managing and delivering those captions is a challenge of substantial complexity. A possible solution to the problem is in the form of Automatic Speech Recognition (ASR).

Huge advances in recent years have allowed the technology to be a realistic player in captioning for the very first time. It means that it can be used in our workflows to drive up productivity and allows us to cover more than ever before, more quickly, and more effectively.

At Red Bee Media, introducing the element of ASR has allowed us to create a solution that makes the production of accurate captions more cost-efficient. Using Speechmatics’ highly accurate ASR technology, we are able to apply captions to video content for online and social more efficiently and more accurately than before. This has also enabled us to transcribe thousands of hours of video content to make it easily searchable in a secure environment.

This is not only applicable to new content either; we can also pull from existing video content that needs repurposing, and it is significantly more efficient and easier to execute.


Captioning is at an inflection point; new technologies, new media, and new problems have created the perfect storm of issues, but also new opportunities. The industry is ripe for embracing new technological solutions, widening its reach to allow for the creation of better, more efficient captioning for the broadest possible audience.

Tom Wootton is product manager at Red Bee Media, part of Ericsson.

You might also like...

ATSC 3.0: Right Place, Right Time

Many people and cultures celebrate special New Year dates. Organizations designate fiscal years. Broadcasters traditionally mark their new technology year mid-April, at annual NAB Shows. Old habits die hard.

Apple’s M1 ARM For Broadcast Infrastructure Applications: Part 2

In part 2 of this investigation, we look at why Apple’s new M1 processor benefits broadcasters.

The World Of OTT: Part 10 - Optimizing Encoding & Contribution For Live OTT

OTT has dramatically expanded the range of delivery outlets for content and continues to do so. This has had a direct effect on content production, enabling almost any organization or person to create and distribute live content, which has increased…

Is Gamma Still Needed?: Part 10 - Summary

In this final part of the series, an attempt will be made to summarize all that has gone before and to see what it means.

Timing: Part 1 - Sidereal Or Solar?

The subjects of timing, synchronizing and broadcasting are inseparable and in this new series John Watkinson will look at the fundamentals of timing, areas in which fundamental progress was made, how we got where we are and where we might…