Software Infrastructure Global Viewpoint – November 2021

Retrieving Media Liabilities

As data storage expands beyond all comprehension, it seems like we are focused on storing every media asset we can get our hands on, but data storage is only as important as data retrieval.

The reason I make this point is that we cannot go on storing data forever. It’s not that we don’t have the storage capacity, as we have seen with cloud technology and its potential for yottabytes (and beyond) of storage, it’s the efficiency of retrieval that concerns me.

I believe data storage is turning into the engineer’s cupboard where for years we’ve stored all the equipment we decommissioned or the bits of electronics that may “come in useful one day”. And just like this engineer’s cupboard we run the risk of storing so much media that it becomes practically useless as we can’t remember what we’ve stored and so we can’t retrieve it.

To overcome the retrieval challenge, highly efficient engineers would make an index of everything they have stored and some I’ve worked with even had a barcoding system. However, the memory of said equipment was only as good as the storage index which was usually so abstract that it was out of date before the metaphorical ink was dry on the paper. And it had to be abstract otherwise the index would take forever to analyze, hence reducing the ability to retrieve the equipment.

It seems that despite our best efforts, the human memory is exceptionally poor and even maintaining an index will leave a lot of information undiscovered for years, until at least an audit is done or said engineer leaves the building and somebody discovers their hoard.

We do have an emerging technology that may well help with this conundrum, that is, machine learning (ML). Vision and audio ML are capable of scanning media to create metadata classification tags. In effect, metadata is just an abstract proxy of the original media that forms the basis of an index.

Metadata certainly helps us retrieve clips and tracks once the media is stored in a computer resource, and parsing it is often an automated non-human task. As ML engines improve and their learning expands, media can be continually reprocessed to find new metadata thus providing more information for the indexing, retrieval, and search tools.

But even so, how much data can be efficiently tagged and how much media should we store for the future? I suppose the real problem is that we don’t know, so we just keep going on storing media until somebody looks at the cloud storage costs and gulps. Just like the engineer’s cupboard when somebody discovers a massive amount of real estate that is being unnecessarily provisioned.

It’s fair to say that ML is continuing to develop to provide untold opportunities for broadcasters and innovators, and we’re certainly in the information storage phase. But storage is only as effective as retrieval so we must keep one eye on what we’re trying to achieve so that we’ll be ready for the next phase – knowing which media assets to delete so they don’t become media liabilities!