Software Infrastructure Global Viewpoint – January 2022

Data Binging

Storage has turned a corner, it’s free! Well maybe not quite, but the proliferation of cloud providers and massive on-prem petabytes of disk availability has made data storage much more affordable than ever. But are there some unintended side effects of our newfound freedom of storing anything we want?

There was a time when data storage, or backing store as we used to call it, was so expensive that only the privileged few could afford to use it. Huge disk drives the size of a washing machine could store an entire megabyte of data, and six-foot-tall tape drives whirled away in the background emulating a 1970’s sci-fi film.

We’ve always traded access speed against cost often resulting in slow retrieval and high-volume applications. And as new technologies have developed, the previous generation moves down the pecking order to become the third and fourth levels of long-term repository, thus keeping a constant downward pressure on costs.

I believe, the challenge with storage is now not how much we can store and for how long, but deciding on what we should throw away, or more succinctly, delete.

The first question to address is should we delete anything? Why not just store every piece of data we record? Films, channel outputs, monitoring information, all this can be easily streamed to storage devices and kept for as long as we dare to thing about the future. Storage is only going to get cheaper, so why not just store everything?

For me, it’s not the monetary cost of storage that is the challenge, but the cost of retrieval. After all, data storage is only as effective as data retrieval. What’s the point of storing massive amounts of data if we don’t know it’s there? Intelligent search engines have helped enormously with this task as they can tag the data. And as we add AI to the mix then classification becomes more effective leading to more efficient retrieval.

However, I fear by using this strategy that we run the risk of falling into a non-agile methodology of thinking where we more easily adopt the attitude of “keep it because it might come in useful one day”. Furthermore, in our clammer to make television programs with its associated high-pressure time scales we run the risk of just storing everything because it’s a lower risk strategy. But at what cost?

In years to come I could imagine our successors looking at the enormous data archives and thinking where an earth to start when deleting data as they’re run out of space, or it’s just become too inefficient to store. To alleviate this, maybe we should be paying more attention to classification of the data during ingest, adding much greater granularity to the descriptions and tags so that in years to come, the decisions on what to keep becomes much easier.

Commenting is not available in this channel entry.