Quantum is launching aiWARE for Xcellis, an on-premise version of Veritone’s cloud-based artificial intelligence platform. This solution enables organizations to apply cognitive analytics to video and audio content libraries without the cost and hassle of moving their large media libraries to the cloud.
Here, we talk with Quantum's senior director of product marketing Dave Frederick about the development and begin by asking whether the industry in general is confusing customers by badging solutions as artificially intelligent when in fact they perhaps better described as advanced analytics.
Dave Frederick, Quantum: Perhaps. We see customers looking for three levels of awareness. The first level is storage awareness with respect to infrastructure, and it answers questions about storage capacity, how is it allocated, etc. The second level is file awareness. This is awareness about what’s being stored — files and directories — in the capacity, what’s being archived, when was a file touched last, who changed a file and when, etc. The third level is content awareness — the knowledge of what files contain, what content looks like, what’s being said, etc.
The first two levels of awareness are usually generated by systems that automatically collect and store information and report it when needed. The third level can be human-generated if necessary. Or, it can be generated with artificial intelligence (AI).
AI offers an automated way of generating content-related information that traditionally could be extracted only through human interaction. AI typically relies on some training to recognize the material it’s evaluating. Some AI engines are capable of improving their performance by inspecting more data. This is referred to as machine learning.
AI isn’t needed to analyze disk utilization or find duplicate or similarly named files. Rather, AI is about discovering the attributes of content and generating metadata that enhances the utilization of the content itself.
The new solution is initially available with optical character recognition (OCR), object recognition and transcription ― to extract additional value from their on-premise video and audio content. Can you explain what OCR is and how it works?
DF: OCR engines have the ability to identify readable text in an image or video and convert that text into encoded characters that can be used in compute processes such word processing, spreadsheets, etc. Use cases for OCR analysis include documenting on-screen graphics to identify names of people, reading scoreboards or license plates, or capturing and transcribing text from any scene that includes readable content.
How in practice is aiWARE applied to video and audio assets?
DF: Video and audio represent data that can only be evaluated and cataloged by watching (listening to) it. Customers are using AI to transcribe audio into a searchable text file so that they can quickly find specific content within an entire library of files. Once a text directory is created, further inspection might take the form of OCR (described above), object detection (finding known objects), object recognition (finding objects that look like a specific example), facial detection (the presence of a face on the screen), facial recognition (the person on the screen), speaker separation (split an interview or conversation into separate speakers), and more. While it’s cost-effective to run transcription against an entire library and beneficial to create an index with the resulting data, more advanced AI engines typically will be applied only to a subset of files to save time and money.
Can you elaborate on the orchestration element (multiple engines sequentially processing the same data and access to cloud-based AI engines for additional processes when desired) by explaining how this works and what this means for organizations?
DF: When the content being sought requires a series of machines for content analysis, multiple analyses can be linked so that each benefits from the information already gleaned by others. This model reduces the overall time and cost of processing as multiple engines hone in on the ultimate result. Orchestration allows these operations to be scheduled and run automatically in sequence.
By 2020 - how will AI have improved (to perform things it cannot do now perhaps)
DF: Besides becoming more accurate, AI engines will also get faster. This is important in that AI will be used more and more often at the time of ingest or even at initial capture to create metadata in real time. The resulting metadata could be used throughout the entire production, post-production and delivery process. We will also see specialized engines that understand different topics or fields, such as medical, legal, financial, etc.
You might also like...
The transformation of the media and entertainment workflow from discrete, server-based silos to software-based environments is well underway. As the industry makes this shift, media companies find that placing a scale-out storage solution at the heart of the IP workflow…
Storing assets is pointless if the correct procedures are not in place to manage where these assets are, keep them secure and ensure they are discoverable. In short, those who implement the bare minimum in plain storage risk missing out…
As today’s media workflows increase in size and speed, with Big Data analysis and Fast Data processing added to the mix, the need to better manage the entire lifecycle of content becomes ever more important. Building an efficient and eff…
In case you missed a day with The Broadcast Bridge, here are two popular articles that may be of special interest. These articles focus on specific solutions to help you and your facility operate more efficiently and economically—including some k…
Broadcasters have finally been able to harmonize the Master Exchange Format (MXF) with the Digital Cinema Distribution format (DCP) and other international media exchange formats such as Digital Production Partnership (DPP). The result is a new specification called the Interoperable…