Quantum is launching aiWARE for Xcellis, an on-premise version of Veritone’s cloud-based artificial intelligence platform. This solution enables organizations to apply cognitive analytics to video and audio content libraries without the cost and hassle of moving their large media libraries to the cloud.
Here, we talk with Quantum's senior director of product marketing Dave Frederick about the development and begin by asking whether the industry in general is confusing customers by badging solutions as artificially intelligent when in fact they perhaps better described as advanced analytics.
Dave Frederick, Quantum: Perhaps. We see customers looking for three levels of awareness. The first level is storage awareness with respect to infrastructure, and it answers questions about storage capacity, how is it allocated, etc. The second level is file awareness. This is awareness about what’s being stored — files and directories — in the capacity, what’s being archived, when was a file touched last, who changed a file and when, etc. The third level is content awareness — the knowledge of what files contain, what content looks like, what’s being said, etc.
The first two levels of awareness are usually generated by systems that automatically collect and store information and report it when needed. The third level can be human-generated if necessary. Or, it can be generated with artificial intelligence (AI).
AI offers an automated way of generating content-related information that traditionally could be extracted only through human interaction. AI typically relies on some training to recognize the material it’s evaluating. Some AI engines are capable of improving their performance by inspecting more data. This is referred to as machine learning.
AI isn’t needed to analyze disk utilization or find duplicate or similarly named files. Rather, AI is about discovering the attributes of content and generating metadata that enhances the utilization of the content itself.
The new solution is initially available with optical character recognition (OCR), object recognition and transcription ― to extract additional value from their on-premise video and audio content. Can you explain what OCR is and how it works?
DF: OCR engines have the ability to identify readable text in an image or video and convert that text into encoded characters that can be used in compute processes such word processing, spreadsheets, etc. Use cases for OCR analysis include documenting on-screen graphics to identify names of people, reading scoreboards or license plates, or capturing and transcribing text from any scene that includes readable content.
How in practice is aiWARE applied to video and audio assets?
DF: Video and audio represent data that can only be evaluated and cataloged by watching (listening to) it. Customers are using AI to transcribe audio into a searchable text file so that they can quickly find specific content within an entire library of files. Once a text directory is created, further inspection might take the form of OCR (described above), object detection (finding known objects), object recognition (finding objects that look like a specific example), facial detection (the presence of a face on the screen), facial recognition (the person on the screen), speaker separation (split an interview or conversation into separate speakers), and more. While it’s cost-effective to run transcription against an entire library and beneficial to create an index with the resulting data, more advanced AI engines typically will be applied only to a subset of files to save time and money.
Can you elaborate on the orchestration element (multiple engines sequentially processing the same data and access to cloud-based AI engines for additional processes when desired) by explaining how this works and what this means for organizations?
DF: When the content being sought requires a series of machines for content analysis, multiple analyses can be linked so that each benefits from the information already gleaned by others. This model reduces the overall time and cost of processing as multiple engines hone in on the ultimate result. Orchestration allows these operations to be scheduled and run automatically in sequence.
By 2020 - how will AI have improved (to perform things it cannot do now perhaps)
DF: Besides becoming more accurate, AI engines will also get faster. This is important in that AI will be used more and more often at the time of ingest or even at initial capture to create metadata in real time. The resulting metadata could be used throughout the entire production, post-production and delivery process. We will also see specialized engines that understand different topics or fields, such as medical, legal, financial, etc.
You might also like...
TDM Mesh Networks - A Simple Alternative To Leaf-Spine ST2110: Application - Eurovision Song Contest
With over 4000 signals to distribute, transfer and route, the Eurovision Song Contest (ESC) proved to be this year’s showpiece for Riedel’s TDM based distributed mesh networked system MediorNet. Understanding the intricacies of such an event is key to rea…
Broadcasters are no longer faced with the binary choice of going down the SDI or IP routes. The hybrid method of using TDM (Time Domain Multiplexing) combines the advantages of distributed networks with IP and SDI to deliver a fully…
TDM Mesh Networks: A Simple Alternative To Leaf-Spine ST2110. Pt1 - Balancing Technical Requirements
IP is well known and appreciated for its flexibility, scalability, and resilience. But there are times when the learning curve and installation challenges a complete ST-2110 infrastructure provides are just too great.
IP is delivering unprecedented flexibility and scalability for broadcasters. But there is a price to pay for these benefits, namely, the complexity of the system increases significantly as we add more video and audio over IP.
Never trust the adhesive holding tape to the hub of a 40 year-old ¾-inch videocassette.