Monitoring & Compliance In Broadcast: Accessibility & The Impact Of AI

The proliferation of delivery devices and formats increases the challenges presented by accessibility compliance, but it is an area of rapid AI powered innovation.

Accessibility in video, audio and data is a fast-moving world, with new potential offerings arriving every day. Many of the legal requirements for accessibility are based around providing access for those hard of hearing, primarily related to closed captioning and subtitling.

The main difference between subtitling and closed captioning being that subtitling is used for translation between different languages, and closed captioning is providing same language captioning. The “rules” for these can be different. There is often some blurring of the distinction between subtitling and closed captions when these are being discussed, but they are different.

Increasingly other options are becoming mainstream, such as full audio description which includes contextual information, suitable for those with sight loss, typically adding descriptions of the scene, and actions such as “door opens”. This is becoming increasingly available, although there is less specific legislation in place for this. In addition, options such as sign language availability are becoming available for online services such as video meetings.

Legislation

In the US, a number of legislative requirements are in place, particularly for Closed Captioning across a number of regulatory Acts and regulatory bodies. These include sections 504 and 508 of the Rehabilitation Act, the Americans with Disabilities Act (ADA), WCAG, CVAA and the FCC.

Broadly speaking, these state that prerecorded video must have captions, but also live streams, with 99% accuracy, with no paraphrasing.  The original tone and intent of the speaker must be honored, and the captions must match background noises “to the fullest extent possible”.  These requirements can prove interesting when combined with the technical requirements needed to put them on the screen.

The FCC say Captions should be:- Accurate, Synchronous, Complete, and Properly Placed. For the US, typically text lines are placed at the bottom of the screen as a default, but can be elsewhere if they grossly interfere with the action. Recommendations include no more than 42 characters in length, and not more than two lines of text per subtitle. Recommended reading speeds can vary across the globe, particularly with subtitling being available in different languages from the original, but for Closed captions 21characters per second is default.

In some cases, the technical placement and character settings can impose restrictions that impinge on the requirements to have no paraphrasing, for example, so effective captioning and subtitling can be said to be as much an art as a science.

Automation

In the past 5 years, there have been increasing levels of automation coming into captioning and subtitling operations. Traditionally, there were three main ways for Captions and subtitles to be created by humans. Respeaking, essentially using speech to text, mainly used for live events, Stenography, a system that was originally created for use by Parliaments and other diplomatic meetings, where specialist shorthand operators use specialist keyboards to create the text, and straightforward typing on a standard keyboard, ensuring that the captions are in synch with the audio track. This is a time-consuming process and only used for pre-recorded content. It is often used for language subtitling, where multiple language versions may be required in today’s global market.

Required output from all of these systems, and any automated systems, will be in a variety of file delivery formats, generally specified individually by media organizations. For example, the BBC publish detailed documentation in its Subtitle Guidelines, covering its delivery requirements for subtitle files. Subtitle and Closed Caption files are delivered as separate entities from the video and audio, (although legacy “burnt -in” subtitles can still be found in mainly archive video), primarily so viewers can choose whether to turn them on or off. There are multiple sidecar formats that media organizations use, SCC, SRT, WebVTT, TTML, MCC, STL, to name but a few. Updated initiatives to make the whole workflow process for subtitling and closed captioning a simpler process are constantly ongoing. The EBU also has done extensive work on ensuring accessibility options are increasingly wide.

Automated machine learning and AI based systems are coming in to play for generation, re-generation, and automated output of subtitles and closed captions. When dealing with large volumes of data, AI offers cost and time savings, however at time of writing, it may not offer the quality. Certainly, automated translation between languages is widely available, however this can potentially lead to a loss of nuance and speaker intent.

For AI based systems, extensive training is required to approach the levels that professional human operators can supply. This training needs to be targeted. Basic live Automated Speech Recognition (ASR) is certainly within the grasp of a huge plethora of automated AI based systems, as can be seen on many videos available on the web, but a short survey of general web content will soon show up a far less than 99% accuracy. AI systems need to be constantly evolving and learning continuously.

The more sophisticated ASR systems available now can work with accents, dialects and speech patterns, and coupled with localization algorithms, can analyze idiom, slang, and some cultural references. Where automated systems may still struggle is within context specific, or technical language, and when translating humor and jokes. Once technical parameters are established, AI systems can easily deal with the on-screen requirements such as required fonts, characters per line etc. In some ways, AI in accessibility systems is outstripping the legal requirements and offering more choice to viewers. While regulations and recommendations give a basic framework, AI technologies can speed the processing time and offer viewers more and more options, not just at the production stage. AI based algorithms in display technology can offer the consumer the possibility to customize their captions, maybe adjusting the font, or subtitle position to suit themselves.

Regulatory and industry bodies are devoting much time and energy to ensuring that accessibility models are fit for purpose, and that they are maintained and improved as time goes on. With consultations with charities and representatives of disabled communities, options to improve are constantly in progress, and advances in AI learning offer opportunities for improving accessibility at a rapid pace. 

Having said that, with the current state of play, review by experienced humans is still generally accepted as a wise move, particularly when subtitling between languages, where a phrase that is directly translated can potentially cause offence, laughter or have no clear meaning in the second language. There are an increasing number of effective tools available on the market for review and editing of AI generated captions and subtitles, these themselves include AI assist to enable human reviews to be faster and more accurate. These can also be integrated into full workflows, enabling QC tools for verifying video and audio to work in tandem with tools that are specifically targeted at subtitle and caption production and ensuring local compliance rules are maintained within the process. Automated systems combined with human review are particularly effective when regenerating captions, typically for change of frame rate, change of start timecodes, and modifications for change of video format.

As can be seen, AI in accessibility systems can offer many benefits at different points of the processing, not only in ensuring fundamental compliance with existing regulations, but also offering huge advantages in speed, from first creation to distribution. While this discussion has mainly centered around subtitling and captioning, the potential for AI analysis in audio description and other accessibility technologies is increasing literally day by day, and AI involvement is proving hugely beneficial.

Part of a series supported by

You might also like...

Requirements For A Video CDN Blueprint

We continue our series discussing the current lack of sufficient streaming infrastructure capacity to meet demand if the current rate of consumer transition to streaming services continues. Here we have an assessment of the key industry wide objectives that future…

Ad & Content Targeting With First Party Data And Video SMS

The continuing rise in streaming combined with a swing away from third party to first party data is driving broadcasters to seek new ways of engaging and reaching viewers for both content and ad targeting. Some video service providers are…

Monitoring & Compliance In Broadcast: Monitoring QoS & QoE To Power Monetization

Measuring Quality of Experience (QoE) as perceived by viewers has become critical for monetization both from targeted advertising and direct content consumption.

Preventing The Streaming Tsunami

Today, most broadcasters deliver less than 10% of their total viewing hours via OTT streaming services. As that shifts to streaming first delivery the Tsunami will be big… so what can be done about it?

Local TV In The U.S.A – 1967 Style

Our very own TV pioneer shares recollections of local TV in the US from his start in 1967.