Closing In On Methods For Long Term Archiving

As the amount of data in the world keeps exponentially multiplying, a Holy Grail in research is finding a way to reliably preserve that data for the ages. Researchers are now closing in on methods to make data permanent. The problem is there is no way to be absolutely sure it will work far into the future.

By 2023, Microsoft predicts that over 100 zettabytes of data — including movies, television programming and audio — will be stored in the cloud. That staggering amount of data requires a fundamental re-thinking of how large-scale storage systems operate.

In 2016, Microsoft began a partnership with the University of Southampton Optoelectronics Research Centre in the UK to tackle the archiving issue. It is called Project Silica. The project is designed to store cold data — or data that is infrequently accessed. It doesn’t need to sit on a server for instant use.

Through the project, Microsoft is testing glass as a long term storage medium. Recently, it did an experiment with Warner Brothers to store a copy of the 1978 film, Superman, on a glass disc that is 7.5 cm x 7.5 cm x 2 mm.

Microsoft Project Silica senior optical scientist Patrick Anderson loads the system to write data to glass. Photo by Jonathan Banks <br />

Microsoft Project Silica senior optical scientist Patrick Anderson loads the system to write data to glass. Photo by Jonathan Banks

The glass contains 75.6 GB of data plus error redundancy codes. It is said to be the first test of the new archiving technology for long term storage of films and television programs.

Theoretically, the glass storage could last thousands of years. If it works, a studio like Warner Brothers, who houses some 20 million film assets in temperature controlled warehouses, would have an extra level of protection.

Glass has long been used to preserve audio programming, going back to the radio drama days. In World War II, metal record platters were banned due to metal shortages and glass was substituted for recording. Though glass lasts a long time, it is also delicate. Everyone who has worked with glass discs have opened boxes to find the platters shattered.

However, Microsoft’s methods are different. Project Silica uses lasers similar to those used for Lasik eye surgeries to burn small geometrical shapes, also known as voxels, into the glass. The multiple bits for each voxel is encoded and the data is applied in multiple layers. For the Warner Brothers experiment, 74 layers were used for the Superman film.

Once the data for the program is embedded into the glass, the content is accessed by shining a light through the disc and capturing the data with microscope-like readers. The Warner’s film was checked bit by bit and it was flawless.

Microsoft senior optical scientist James Clegg reads data with a specialized microscope. Photo by Jonathan Banks <br />

Microsoft senior optical scientist James Clegg reads data with a specialized microscope. Photo by Jonathan Banks

So what about the easy breakage of glass? Microsoft said it did extensive tests to make sure that Project Silica storage media didn’t easily damage. It was baked in hot ovens, submerged in boiling water, microwaved and scratched with steel wool. But, all glass still breaks. Apple’s iPhone screens are supposed to be the toughest glass in the world and the screens still easily break when dropped. Only time will tell if the Project Silica glass is tough enough.

Also, there is a question of whether or not the readers for such discs will still be manufactured a thousand years into the future. Technology changes and companies go out of business. It is anybody’s guess how this will play out.

Microsoft’s own cloud, called Azure, already has a major interest in safekeeping vast amounts of both hot and cold data. Azure still uses tape, which has to be checked frequently and re-copied to maintain data integrity. Glass could one day be a more secure solution to safekeep data for the company and its customers.

Much work remains to be done on Project Silica. Read- and write-operations need to be unified into a single device, and the amount of data stored on one piece of glass needs to increase. But the company is betting that the future of long term archiving is in glass.

Microsoft also has a parallel project using DNA molecules for archival storage. The beauty of DNA is it can archive an exabyte per cubic millimeter and have a life of over 500 years. But how will it be read far into the future?

Others are also researching long term archiving. Group 47, formed in 2008 to secure the patents, designs and manufacturing processes for DOTS, developed by the Eastman Kodak Company.

DOTS (Digital Optical Technology System) is a 100-year archival technology that is non-magnetic, chemically inert and immune from electromagnetic fields including electromagnetic pulse (EMP). The storage media can be stored in normal office environments or extremes ranging from 15 to 150-degrees F.

DOTS is stored on a phase change media composed of a metallic alloy sputtered on an archival polyester base. To tackle reader availability in the future, DOTS is a true visual “eye-readable” method of storing digital files. With sufficient magnification, any eye can actually see the digital information.

A “Rosetta Leader” specification calls for microfiche-scale human readable text at the beginning of each tape with instructions on how the data is encoded and instructions on how to actually construct a reader. Because the information is visible, any high magnification camera can read the information.

Long term archival systems are incredibly complex because computer operating systems, hardware/software and technology as a whole are constantly changing. What works today may not work tomorrow, much less a 1,000 years from now.

And perhaps most problematic of all, how does anyone living in today’s world know how long anything will last? It’s a major problem with no easy solutions. 

You might also like...

Data Recording: Cyclic Redundancy Checks - Part 19

The CRC (cyclic redundancy check) was primarily an error detector, but it did allow some early error correction systems to be implemented. There are many different CRCs but they all work in much the same way, which is that the…

Data Recording: Modulo Counting - Part 18

The mathematics of finite fields and sequences seems to be a long way from everyday life, but it happens in the background every time we use a computer and without it, an explanation of modern error correction cannot be given.

Selecting A Content Creation Laptop

Computer marketing departments typically do not promote all company products. Rather they focus on high margin products.

Data Recording and Transmission: Error Correction II - Part 17

Here we look at one of the first practical error-correcting codes to find wide usage. Richard Hamming worked with early computers and became frustrated when errors made them crash. The rest is history.

Data Recording: Error Correction - Part 16

Error correction is fascinating not least because it involves concepts that are not much used elsewhere, along with some idiomatic terminology that needs careful definition.