IBC2018 Show Event Channel

Everything you need to know for the show and exhibitors.

Click here

Trint Provides Sound-to-Text Transcription with Karaoke Clarity

Trint can transcribe the spoken word to written text with accuracy enhanced by video cues to produce highly accurate voice-to-text conversion.

After 30 years as an Emmy-wining reporter, Jeff Kofman knew as well as anyone that one of the greatest bottlenecks in video journalism was transcribing the audio recorded with his taped interviews.

So in December of 2014 he left his post with ABC News, and along with some open source engineers he met at the Mozilla Festival, an annual gathering defending Internet freedom, set out to see if they could use Artificial Intelligence to turn the spoken voice into written words.

“It took us 20 months to build Trint,” Kofman, who is now the company’s CEO, told me during our interview from London, “but we launched in September 2016 and we’ve experienced a quick uptake since then.”

There have been many attempts at speech-to-text translation softwares, but Trint succeeds where others have fallen short for two major reasons. First, thanks to the implementation of Artificial Intelligence, it is highly accurate. Second, and very cleverly, its display accompanies the text with a track of video thumbnails that can be utilized as a reference in case unclear or multi-meaning words have to be interpreted.

Trint presents the text along with an audio waveform and associated video samples

Trint presents the text along with an audio waveform and associated video samples

“Systems like Apple’s Siri can be pretty good, but if they are only 95% reliable that means they make mistakes,” Kofman said. “We wanted a mechanism that let you identify those mistakes immediately. Our solution was to marry a text editor with an audio/video player. So we glued the text created by AI to the associated visuals or audio waveform on a millisecond basis, sort of like following karaoke. So Trint gives you a file that is searchable by either text or visuals, letting you watch and hear the results at the same time.”

This, of course, this software has an additional benefit for those who are hearing impaired since they can read what is being spoken with visual cues to help explain the context.

“We have also recently released a version for iPhone,” Kofman continued, “and even an integration for the Adobe Premiere Pro NLE that you can get from the Adobe store for free.”

The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript

The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript

Naturally, that lead me to want to get some input from AP, and Derl McCrudden, the deputy managing editor of the Associate Press’s global digital and visual journalism, granted me an interview from his London office.

“Just for news alone, we put our 160 stories every day,” McCrudden told me, “which can total over 70,000 edits every year including sports, style, the arts, etc.. So our reporting workload is tremendous and the help we get from a transcription software like Trint is invaluable.”

The AP is, of course, a B-to-B operation putting out white labeled stories that other news operations regularly re-purpose.

“We need to allow our people to do high volume work, and this kind of software relieves them of some of the drudgery of translating interviews,” McCrudden said. “Its speed and accuracy helps our people checking transcripts for editorial value, and enriches our reports for both text and video.”

He cited a recent example where an AP reporter was standing next to one of their video journalists shooting a Volkswagen press conference from its headquarters in Wolfsburg, Germany.

“While the video was being shot, the AP journalist was using Trint to transcribe the interview,” McCrudden told me. “By the time the press conference finished, the entire transcript was completed. That meant the video journalist could use it as the basis for the quick-turnaround story. But perhaps even more importantly, the entire transcript then lived in the cloud and our business reporter who was located in Frankfurt could access the transcription software's user interface and pick out the key quotes to complete his more refined in-depth report. We no longer had to wait for the transcription to come in from the field to get the news out in different formats.”

Currently, the software's Web sit lists over a dozen languages and dialects it can handle, and the basket of babble is constantly growing.

“The non-English use of this transcrition system is very valuable to us,” McCrudden said, “which means we can get the transcription as early as possible. Once we receive the text, our producers can translate it. This is facilitated by being able to see the video along with the words being spoken, and frankly the usability of the interface is one aspect that first attracted us to this transcryption system.”

The company just released a dictionary that lets you add custom words to the system so it can recognize verbiage previously beyond its scope. The next goal? Tackling live speech without an intermediate delay. The limits of this application of AI have yet to be approached.

Let us know what you think…

Log-in or Register for free to post comments…

You might also like...

Sony Virtual Production Service Launched at Red Bull Event

Although OTT delivery has created a mature market for on-demand scripted shows that leverages the public internet for distribution, the ever increasing and IP-enabled bandwidth available that uses public wireless networks and the public cloud, is opening a new market…

NEP Sweden Rolls Out XT4K Servers In New UHD-1 HDR OB Truck For 2018 IIHF Championship

NEP Sweden, a division of NEP Europe, has selected EVS XT4K ChannelMAX servers to drive the ingest, playout, slow motion replay and highlights production within its new UHD-1 OB truck. The mobile production unit was used for the first…

Documentarian Use of Film - Part 3

Working with older storage technology, here we mean small gauge film, is a challenge requiring special techniques. In this concluding segment of a three-part series, we examine image quality differences that may result in when transferring Super 8 and 8mm film…

Articles You May Have Missed – June 27, 2018

IP video networks have created a new set of test and measurement challenges for broadcast engineers, especially with respect to managing network congestion.

Articles You May Have Missed – June 20, 2018

Until now, 4K/UHD and high dynamic range (HDR), in many ways, has been little more than a science project, as manufacturers have struggled to convince production entities of the long-term practicality and viability. Fears of overly complex pipelines and…