Trint can transcribe the spoken word to written text with accuracy enhanced by video cues to produce highly accurate voice-to-text conversion.
After 30 years as an Emmy-wining reporter, Jeff Kofman knew as well as anyone that one of the greatest bottlenecks in video journalism was transcribing the audio recorded with his taped interviews.
So in December of 2014 he left his post with ABC News, and along with some open source engineers he met at the Mozilla Festival, an annual gathering defending Internet freedom, set out to see if they could use Artificial Intelligence to turn the spoken voice into written words.
“It took us 20 months to build Trint,” Kofman, who is now the company’s CEO, told me during our interview from London, “but we launched in September 2016 and we’ve experienced a quick uptake since then.”
There have been many attempts at speech-to-text translation softwares, but Trint succeeds where others have fallen short for two major reasons. First, thanks to the implementation of Artificial Intelligence, it is highly accurate. Second, and very cleverly, its display accompanies the text with a track of video thumbnails that can be utilized as a reference in case unclear or multi-meaning words have to be interpreted.
Trint presents the text along with an audio waveform and associated video samples
“Systems like Apple’s Siri can be pretty good, but if they are only 95% reliable that means they make mistakes,” Kofman said. “We wanted a mechanism that let you identify those mistakes immediately. Our solution was to marry a text editor with an audio/video player. So we glued the text created by AI to the associated visuals or audio waveform on a millisecond basis, sort of like following karaoke. So Trint gives you a file that is searchable by either text or visuals, letting you watch and hear the results at the same time.”
This, of course, this software has an additional benefit for those who are hearing impaired since they can read what is being spoken with visual cues to help explain the context.
“We have also recently released a version for iPhone,” Kofman continued, “and even an integration for the Adobe Premiere Pro NLE that you can get from the Adobe store for free.”
The program allows Adobe Premiere Pro users to add subtitling directly from the embedded transcript
Naturally, that lead me to want to get some input from AP, and Derl McCrudden, the deputy managing editor of the Associate Press’s global digital and visual journalism, granted me an interview from his London office.
“Just for news alone, we put our 160 stories every day,” McCrudden told me, “which can total over 70,000 edits every year including sports, style, the arts, etc.. So our reporting workload is tremendous and the help we get from a transcription software like Trint is invaluable.”
The AP is, of course, a B-to-B operation putting out white labeled stories that other news operations regularly re-purpose.
“We need to allow our people to do high volume work, and this kind of software relieves them of some of the drudgery of translating interviews,” McCrudden said. “Its speed and accuracy helps our people checking transcripts for editorial value, and enriches our reports for both text and video.”
He cited a recent example where an AP reporter was standing next to one of their video journalists shooting a Volkswagen press conference from its headquarters in Wolfsburg, Germany.
“While the video was being shot, the AP journalist was using Trint to transcribe the interview,” McCrudden told me. “By the time the press conference finished, the entire transcript was completed. That meant the video journalist could use it as the basis for the quick-turnaround story. But perhaps even more importantly, the entire transcript then lived in the cloud and our business reporter who was located in Frankfurt could access the transcription software's user interface and pick out the key quotes to complete his more refined in-depth report. We no longer had to wait for the transcription to come in from the field to get the news out in different formats.”
Currently, the software's Web sit lists over a dozen languages and dialects it can handle, and the basket of babble is constantly growing.
“The non-English use of this transcrition system is very valuable to us,” McCrudden said, “which means we can get the transcription as early as possible. Once we receive the text, our producers can translate it. This is facilitated by being able to see the video along with the words being spoken, and frankly the usability of the interface is one aspect that first attracted us to this transcryption system.”
The company just released a dictionary that lets you add custom words to the system so it can recognize verbiage previously beyond its scope. The next goal? Tackling live speech without an intermediate delay. The limits of this application of AI have yet to be approached.
You might also like...
Philo T. Farnsworth was the original TV pioneer. When he transmitted the first picture from a camera to a receiver in another room in 1927, he exclaimed to technicians helping him, “There you are – electronic television!” What’s never been quoted but lik…
NASCAR Productions, based in Charlotte NC, prides itself on maintaining one of the most technically advanced content creation organizations in the country. It’s responsible for providing content, graphics and other show elements to broadcasters (mainly Fox and NBC), as w…
Behind the more than 100 television cameras and an arsenal of the most advanced broadcast technology ever assembled, the anchors reporting the 53rd Super Bowl will concentrate on the ancient art of storytelling.
During Super Bowl LIII, the football action will be on the field. But a lot of the action will be enhanced by incredible new graphics, some virtual, that CBS is using to super charge the screen.
We editors, color graders and graphics artists are an opinionated group and that’s a good thing because with the speed technology is changing we need open communication among ourselves.