Audio Global Viewpoint – May 2021
AI Archiving

With the rapid demise of VTR support and the concerns of tape storage longevity, the race is on to digitize media assets and efficiently store them. But information retrieval is only as efficient as classification, otherwise we’d never be able to find the assets in the first place. So how will AI help with this?
Metadata and classification are the essence of tagging media assets during digitization. The need to digitally store media is well understood and whether it’s stored on-prem or in the cloud becomes a “risk vs costs” business decision that is often taken by those other than the engineering team.
To understand the power of metadata and classification then we should think one step ahead and look at the world from the point-of-view of the producer or editor. They may want GVs of a summer’s day in New York during the 1980s, and the accuracy with which they’ll find these GVs is based on the metadata tagging. Historically this relied on somebody sat in front of a VT machine manually typing metadata into a text file or database with the appropriate timecode entry for the duration of the media asset.
Neural Networks (NN) are a subset of the genre Machine Learning (ML), which in turn is encapsulated by the massive topic of AI. There are many other types of ML such as Bayesian Classification and Random Forest regression, but NNs have made massive advances in recent years in both their efficiency and accuracy, especially for image processing.
Image classification is something humans excel at. We can instantly recognize a face, or a danger such as a speeding car racing towards us. But we are very bad at maintaining high levels of concentration during repetitive tasks, such as classifying media assets. We’ve recognized for many years that computers are very efficient at making objective measurements but not so good at making subjective decisions. For example, auto-QC can easily detect whether a video frame is out of gamut, but not so easily detect the difference between a sunrise and a sunset.
ML and NNs are not magic, they are data-led decision making algorithms. In other words, instead of creating a program and trying to predict all the classifications based on all the possible inputs presented to it, we use “other” data to teach the NN the classifications we need so that when “unseen” data is later presented, the ML model can use its prior knowledge (derived from its learning phase) to provide the metadata and classification tagging.
The idea is that we can present many different images of a sunset so the ML algorithm can learn a generic set of sunsets, with the expectation that when an image of a sunset is presented that it has not yet seen, it will be able to predict with a high level of certainty the images containing sunsets. This is analogous to how humans learn. I haven’t experienced every single image of an apple, but I know with a certain amount of reliability, based on my knowledge gained through learning the appearance of an apple (I wasn’t born with this), that I’ll be able to identify most, if not all apples, even the ones I’ve never seen before. This is the power of NNs.
ML can also be used for advanced methods of auto-QC. Humans can generally detect an error caused through tape-dropout, but standard QC would find this difficult. ML based QC, with the appropriate training would be able to detect this.
The demand for archiving is now massive. Although we may be able to sit somebody in front of a VT machine and have them manually generate metadata, wouldn’t it be much more efficient to spend the time creating an ML solution (probably based on NNs), automate the process and have the archiving efficiently running 24/7?