Playout & Transmission Global Viewpoint – April 2018

The Unintended Benefits of Captioning

Once intended to aid the hearing-impaired, TV captioning (CC) has taken on a life of its own. Some viewers use CC to keep up with dialog and plot development without turning up the volume, while others use CC to translate or learn another language.

In the early 1970s, the ability to synchronize and overlay local computer graphics over an NTSC signal was an amazing technological milestone. During those days before ENG and portable video gear swept the industry, early computer-generated video overlay technology for TV production was being developed by CBS Labs (Vidifont) and Chyron (now ChyronHego) and their first products were very expensive. They were the leading edge of graphics cards, licensed custom-bitmapped video calligraphy, memory and storage.

At the same time, TV captioning debuted thanks to pioneering and testing at PBS, WETA and WGBH. The first captions were “open.” Open captions are part of the content and can’t be turned off.

The new video overlay and genlock technology that enabled keying computer video from expensive graphics cards for TV production, also enabled closed captioning (CC) with DOS-level graphics. CC could be activated by the viewer and it overlaid live TV content. Early closed captions required an expensive set-top box to demodulate, decode, generate, genlock and overlay the caption text, and modulate a new TV signal with the captions. The CC box cost more than most TVs.

In 1990, the Americans with Disabilities Act was passed, and the FCC’s Television Decoder Circuitry Act of 1990 required all new TV receivers 13” inches or larger to have the built-in ability to display closed captioning by July 1993. The Telecommunications Act of 1996 added built-in captioning for DTVs by July 2002. With such a depth of regulation and infrastructure in place, success was inevitable.

Captions or Subtitles?

Captioning includes subtitles and captions. Subtitles assume the viewer can hear the sound effects and music but not understand the dialog. Subtitles are often translations for people who don’t speak the language of the medium, such as foreign films.


Captions are used by almost everybody, especially in noisy locations like the gym, airports, almost any place with live TV and a group of people. Image Courtesy Wowza.

On the other hand, captions are designed specifically for people who can’t hear. In addition to the spoken words, captions may identify who is speaking, their manner of speaking, and describe any significant music or sound in the captioning text stream.

This article will refer to captions and subtitles as the generic “CC.” To us broadcast engineers, it's all simply data in a stream.

Globalization has increased the use of CC, because it increases the value of TV content by making it available in more than one language. Subtitles are the least-expensive to create as they only provide dialog-to-text information. Captions include that and SFX descriptions usually annotated by a human.

A new industry

As it turns out, viewers don’t need to be deaf or disabled to benefit from CC. It is ubiquitous on TVs in airports, bars, hospitals, gyms and other places sensitive to noise pollution. Captioning can also be the basis for content search across media libraries. What was once intended to resolve a specific medical issue has resolved a number of TV-related societal challenges.

In addition to government CC laws and requirements, viewer expectations for CC has increased with its ubiquity. Viewers have learned that CC can supplement the TV experience without having to turn up the volume.

The Process

The process of creating captioning can either manual or automatic, and real-time or off-line. The goal is to provide the most mistake-free captions with the least delay. In terms of the people who manually caption, off-line captioning is considered entry level. On-line, real-time unscripted captioning pays the best money for captioning talent and skill because it requires the ability to keep up with different people speaking at rates up to a couple of hundred words per minute. Many who professionally live-caption began as trained court reporters.

Enhanced Electronic Newsroom Technique (ENT) Procedures and improvements in speech recognition software continue to drive much of the industry toward automated captioning. On-line, real-time captioning is expensive compared to speech recognition automation.

The value of on-line, real-time is that its human-controlled accuracy protects the brand from embarrassing errors. The 24-hour cable channels each crank out about a quarter-million words/day and are not impervious to an occasional CC accident. Some can be entertaining if you don’t work there.

Commenting is not available in this channel entry.