IBC2018 Show Event Channel

Everything you need to know for the show and exhibitors.

Click here

Trint Provides Sound-to-Text Transcription with Karaoke Clarity

Trint can transcribe the spoken word to written text with accuracy enhanced by video cues to produce highly accurate voice-to-text conversion.

After 30 years as an Emmy-wining reporter, Jeff Kofman knew as well as anyone that one of the greatest bottlenecks in video journalism was transcribing the audio recorded with his taped interviews.

So in December of 2014 he left his post with ABC News, and along with some open source engineers he met at the Mozilla Festival, an annual gathering defending Internet freedom, set out to see if they could use Artificial Intelligence to turn the spoken voice into written words.

“It took us 20 months to build Trint,” Kofman, who is now the company’s CEO, told me during our interview from London, “but we launched in September 2016 and we’ve experienced a quick uptake since then.”

There have been many attempts at speech-to-text translation softwares, but Trint succeeds where others have fallen short for two major reasons. First, thanks to the implementation of Artificial Intelligence, it is highly accurate. Second, and very cleverly, its display accompanies the text with a track of video thumbnails that can be utilized as a reference in case unclear or multi-meaning words have to be interpreted.

Trint presents the text along with an audio waveform and associated video samples

Trint presents the text along with an audio waveform and associated video samples

“Systems like Apple’s Siri can be pretty good, but if they are only 95% reliable that means they make mistakes,” Kofman said. “We wanted a mechanism that let you identify those mistakes immediately. Our solution was to marry a text editor with an audio/video player. So we glued the text created by AI to the associated visuals or audio waveform on a millisecond basis, sort of like following karaoke. So Trint gives you a file that is searchable by either text or visuals, letting you watch and hear the results at the same time.”

This, of course, this software has an additional benefit for those who are hearing impaired since they can read what is being spoken with visual cues to help explain the context.

“We have also recently released a version for iPhone,” Kofman continued, “and even an integration for the Adobe Premiere Pro NLE that you can get from the Adobe store for free.”

The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript

The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript

Naturally, that lead me to want to get some input from AP, and Derl McCrudden, the deputy managing editor of the Associate Press’s global digital and visual journalism, granted me an interview from his London office.

“Just for news alone, we put our 160 stories every day,” McCrudden told me, “which can total over 70,000 edits every year including sports, style, the arts, etc.. So our reporting workload is tremendous and the help we get from a transcription software like Trint is invaluable.”

The AP is, of course, a B-to-B operation putting out white labeled stories that other news operations regularly re-purpose.

“We need to allow our people to do high volume work, and this kind of software relieves them of some of the drudgery of translating interviews,” McCrudden said. “Its speed and accuracy helps our people checking transcripts for editorial value, and enriches our reports for both text and video.”

He cited a recent example where an AP reporter was standing next to one of their video journalists shooting a Volkswagen press conference from its headquarters in Wolfsburg, Germany.

“While the video was being shot, the AP journalist was using Trint to transcribe the interview,” McCrudden told me. “By the time the press conference finished, the entire transcript was completed. That meant the video journalist could use it as the basis for the quick-turnaround story. But perhaps even more importantly, the entire transcript then lived in the cloud and our business reporter who was located in Frankfurt could access the transcription software's user interface and pick out the key quotes to complete his more refined in-depth report. We no longer had to wait for the transcription to come in from the field to get the news out in different formats.”

Currently, the software's Web sit lists over a dozen languages and dialects it can handle, and the basket of babble is constantly growing.

“The non-English use of this transcrition system is very valuable to us,” McCrudden said, “which means we can get the transcription as early as possible. Once we receive the text, our producers can translate it. This is facilitated by being able to see the video along with the words being spoken, and frankly the usability of the interface is one aspect that first attracted us to this transcryption system.”

The company just released a dictionary that lets you add custom words to the system so it can recognize verbiage previously beyond its scope. The next goal? Tackling live speech without an intermediate delay. The limits of this application of AI have yet to be approached.

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

Myths and Truths About Standards Conversion

Everyone knows what standards converters do, right? Broadcast professionals recognize that changing the video format and frame rate is necessary when sharing materials internationally or when integrating movies into TV schedules. In fact, there are many types of standards conversion…

Articles You May Have Missed – August 15, 2018

The standards for moving video over IP are all decided, right? Not yet. Even so, the innovation presents unprecedented opportunities and empowers broadcasters to deliver flexibility, scalability, and more efficient workflows. Consultant and The Broadcast Bridge technology editor, Tony Orme,…

Broadcast For IT - Part 18 - Quality Control

Quality Control is one of the many areas where IT and broadcast use similar terms, but the meaning is quite different. Whereas IT focuses on guaranteeing bit rates and packet delivery to improve quality of service and hence quality of…

What is NMOS?

Many engineers believed that the release of SMPTE2110 was sufficient to ensure compatibility for all the gear in a media IP-centric environment. Not so, the standard defines the transport layer only. Complying with ST2110 will only guarantee a signal will…

Broadcast For IT - Part 17 - Compression Formats

The bewildering number of video and audio compression formats available is difficult for those new to the industry to come to terms with. For broadcast engineers and IT engineers to work effectively together, IT engineers must understand the formats used,…