Trint Provides Sound-to-Text Transcription with Karaoke Clarity

Trint can transcribe the spoken word to written text with accuracy enhanced by video cues to produce highly accurate voice-to-text conversion.

After 30 years as an Emmy-wining reporter, Jeff Kofman knew as well as anyone that one of the greatest bottlenecks in video journalism was transcribing the audio recorded with his taped interviews.

So in December of 2014 he left his post with ABC News, and along with some open source engineers he met at the Mozilla Festival, an annual gathering defending Internet freedom, set out to see if they could use Artificial Intelligence to turn the spoken voice into written words.

“It took us 20 months to build Trint,” Kofman, who is now the company’s CEO, told me during our interview from London, “but we launched in September 2016 and we’ve experienced a quick uptake since then.”

There have been many attempts at speech-to-text translation softwares, but Trint succeeds where others have fallen short for two major reasons. First, thanks to the implementation of Artificial Intelligence, it is highly accurate. Second, and very cleverly, its display accompanies the text with a track of video thumbnails that can be utilized as a reference in case unclear or multi-meaning words have to be interpreted.

Trint presents the text along with an audio waveform and associated video samples

Trint presents the text along with an audio waveform and associated video samples

“Systems like Apple’s Siri can be pretty good, but if they are only 95% reliable that means they make mistakes,” Kofman said. “We wanted a mechanism that let you identify those mistakes immediately. Our solution was to marry a text editor with an audio/video player. So we glued the text created by AI to the associated visuals or audio waveform on a millisecond basis, sort of like following karaoke. So Trint gives you a file that is searchable by either text or visuals, letting you watch and hear the results at the same time.”

This, of course, this software has an additional benefit for those who are hearing impaired since they can read what is being spoken with visual cues to help explain the context.

“We have also recently released a version for iPhone,” Kofman continued, “and even an integration for the Adobe Premiere Pro NLE that you can get from the Adobe store for free.”

The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript

The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript

Naturally, that lead me to want to get some input from AP, and Derl McCrudden, the deputy managing editor of the Associate Press’s global digital and visual journalism, granted me an interview from his London office.

“Just for news alone, we put our 160 stories every day,” McCrudden told me, “which can total over 70,000 edits every year including sports, style, the arts, etc.. So our reporting workload is tremendous and the help we get from a transcription software like Trint is invaluable.”

The AP is, of course, a B-to-B operation putting out white labeled stories that other news operations regularly re-purpose.

“We need to allow our people to do high volume work, and this kind of software relieves them of some of the drudgery of translating interviews,” McCrudden said. “Its speed and accuracy helps our people checking transcripts for editorial value, and enriches our reports for both text and video.”

He cited a recent example where an AP reporter was standing next to one of their video journalists shooting a Volkswagen press conference from its headquarters in Wolfsburg, Germany.

“While the video was being shot, the AP journalist was using Trint to transcribe the interview,” McCrudden told me. “By the time the press conference finished, the entire transcript was completed. That meant the video journalist could use it as the basis for the quick-turnaround story. But perhaps even more importantly, the entire transcript then lived in the cloud and our business reporter who was located in Frankfurt could access the transcription software's user interface and pick out the key quotes to complete his more refined in-depth report. We no longer had to wait for the transcription to come in from the field to get the news out in different formats.”

Currently, the software's Web sit lists over a dozen languages and dialects it can handle, and the basket of babble is constantly growing.

“The non-English use of this transcrition system is very valuable to us,” McCrudden said, “which means we can get the transcription as early as possible. Once we receive the text, our producers can translate it. This is facilitated by being able to see the video along with the words being spoken, and frankly the usability of the interface is one aspect that first attracted us to this transcryption system.”

The company just released a dictionary that lets you add custom words to the system so it can recognize verbiage previously beyond its scope. The next goal? Tackling live speech without an intermediate delay. The limits of this application of AI have yet to be approached.

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

Essential Guide: When to Virtualize IP

Moving to IP opens a whole plethora of options for broadcasters. Engineers often speak of the advantages of scalability and flexibility in IP systems. But IP systems take on many flavors, from on-prem to off-prem, private and public cloud. And…

Essential Guide:  Immersive Audio Primer – Part 1

Part one of this four-part series introduces immersive audio, the terminology used, the standards adopted, and the key principles that make it work.

TV’s ‘Back to the Future’ Moment?

Philo T. Farnsworth was the original TV pioneer. When he transmitted the first picture from a camera to a receiver in another room in 1927, he exclaimed to technicians helping him, “There you are – electronic television!” What’s never been quoted but lik…

Essential Guide:  Practical Broadcast Storage

Ground breaking advances in storage technology are paving the way to empower broadcasters to fully utilize IT storage systems. Taking advantage of state-of-the-art machine learning techniques, IT innovators now deliver storage systems that are more resilient, flexible, and reliable than…

eBook:  Preparing for Broadcast IP Infrastructures

This FREE to download eBook is likely to become the reference document you keep close at hand, because, if, like many, you are tasked with Preparing for Broadcast IP Infrastructures. Supported by Riedel, this near 100 pages of in-depth guides, illustrations,…