Quantum is launching aiWARE for Xcellis, an on-premise version of Veritone’s cloud-based artificial intelligence platform. This solution enables organizations to apply cognitive analytics to video and audio content libraries without the cost and hassle of moving their large media libraries to the cloud.
Here, we talk with Quantum's senior director of product marketing Dave Frederick about the development and begin by asking whether the industry in general is confusing customers by badging solutions as artificially intelligent when in fact they perhaps better described as advanced analytics.
Dave Frederick, Quantum: Perhaps. We see customers looking for three levels of awareness. The first level is storage awareness with respect to infrastructure, and it answers questions about storage capacity, how is it allocated, etc. The second level is file awareness. This is awareness about what’s being stored — files and directories — in the capacity, what’s being archived, when was a file touched last, who changed a file and when, etc. The third level is content awareness — the knowledge of what files contain, what content looks like, what’s being said, etc.
The first two levels of awareness are usually generated by systems that automatically collect and store information and report it when needed. The third level can be human-generated if necessary. Or, it can be generated with artificial intelligence (AI).
AI offers an automated way of generating content-related information that traditionally could be extracted only through human interaction. AI typically relies on some training to recognize the material it’s evaluating. Some AI engines are capable of improving their performance by inspecting more data. This is referred to as machine learning.
AI isn’t needed to analyze disk utilization or find duplicate or similarly named files. Rather, AI is about discovering the attributes of content and generating metadata that enhances the utilization of the content itself.
The new solution is initially available with optical character recognition (OCR), object recognition and transcription ― to extract additional value from their on-premise video and audio content. Can you explain what OCR is and how it works?
DF: OCR engines have the ability to identify readable text in an image or video and convert that text into encoded characters that can be used in compute processes such word processing, spreadsheets, etc. Use cases for OCR analysis include documenting on-screen graphics to identify names of people, reading scoreboards or license plates, or capturing and transcribing text from any scene that includes readable content.
How in practice is aiWARE applied to video and audio assets?
DF: Video and audio represent data that can only be evaluated and cataloged by watching (listening to) it. Customers are using AI to transcribe audio into a searchable text file so that they can quickly find specific content within an entire library of files. Once a text directory is created, further inspection might take the form of OCR (described above), object detection (finding known objects), object recognition (finding objects that look like a specific example), facial detection (the presence of a face on the screen), facial recognition (the person on the screen), speaker separation (split an interview or conversation into separate speakers), and more. While it’s cost-effective to run transcription against an entire library and beneficial to create an index with the resulting data, more advanced AI engines typically will be applied only to a subset of files to save time and money.
Can you elaborate on the orchestration element (multiple engines sequentially processing the same data and access to cloud-based AI engines for additional processes when desired) by explaining how this works and what this means for organizations?
DF: When the content being sought requires a series of machines for content analysis, multiple analyses can be linked so that each benefits from the information already gleaned by others. This model reduces the overall time and cost of processing as multiple engines hone in on the ultimate result. Orchestration allows these operations to be scheduled and run automatically in sequence.
By 2020 - how will AI have improved (to perform things it cannot do now perhaps)
DF: Besides becoming more accurate, AI engines will also get faster. This is important in that AI will be used more and more often at the time of ingest or even at initial capture to create metadata in real time. The resulting metadata could be used throughout the entire production, post-production and delivery process. We will also see specialized engines that understand different topics or fields, such as medical, legal, financial, etc.
You might also like...
The EBU (European Broadcasting Union) has struck a partnership with the Digital Production Partnership Ltd (DPP), a UK based business change network, to promote open standards for interoperability between all components of the video cycle as the industry continues its…
At Technicolor PostWorks New York, a veteran post-production and finishing company, clients span both non-fiction clients (Reality TV, science and factual) and scripted clients (episodic television shows and feature films).
Spinning disk (HDD) and flash storage (SSD) drives are nearly the same cost these days, so it’s no surprise that broadcasters are turning increasingly to SSDs for long-term storage of our most critical media files. But did you know t…
Like the creative professionals that use them, today’s storage systems need to be agile and able to serve up content when it is requested, as quickly and efficiently as possible. A real-time performing ‘Primary’ storage system also needs to accom…
What if a video production could be tailored to each viewer, based on transmitted audio and video essence and data stored in the viewer’s browser? Suppose the browser could receive the content and based on the viewer’s personal dat…