The Perfect “Forever” Archival Storage Medium? Don’t Count On It!

Let’s say you or your company own a very valuable artistic work of media — a film like Casablanca, perhaps. How would you go about protecting the digital master file of Casablanca against data corruption, fire, theft or some other unforeseen natural disaster?

As with many things in our lives today, the answer is not so simple, nor is it permanent. The creation of a very long-term, highly reliable archival storage medium is the subject of a vast amount of fast-paced research these days. What’s state-of-the-art in storage today, may not be a year from now.

At the same time, our need to archive a vast amount of data is exploding. Though specific estimates vary, there’s a consensus that the rate of growth is about 50 percent each year. There’s no recession in the digital data storage business. Storage is needed in just about every human activity, but in this article we will focus exclusively on media and entertainment.

In the media business, there are essentially four kinds of data: online, nearline (a contraction of the term “near-online), archival and offline. Online is the data being used at any given moment on your computer. It might be the editing of a news story or a commercial.

Nearline Storage

Nearline storage is simply a copy of online data to another storage medium for quick access. For example, it could be a duplicate of online footage for a continuing news story to a USB memory stick for use by an editor later in the week. Nearline is a sort of compromise between online and archival storage.

Archival storage consists of a dense memory system used to store and rapidly retrieve digital data. It can scale in size — from a small system serving a single television station to a massive, highly secure global archival bank for the largest motion picture company. The storage media for archival systems can be all or part magnetic tape, optical media, hard disks, flash media or the cloud.

Future Technologies

Those are today’s options for data storage. What comes in the future may include far more exotic technologies like holograms, etched sheets of steel or storage in DNA. Research on new technologies is continuing at a fierce pace.

Many vendors make various kinds of storage equipment, while others create middleware that allows broadcast production systems to connect and automatically control storage hardware. The largest of the middleware companies is Front Porch Digital with its DVArchive system. Their competitors include MasstechSGL and others.

Brian Campanotti is an expert in the field of archiving. For the past decade, he has been chief technology officer at Front Porch Digital. Before that he was founder and president of Masstech, now his competitor, and originally a senior broadcast engineer at CBC Television, where he worked on early automation and server technology.

Archive Exchange Format

Campanotti has been a leader in establishing AXF (Archive Exchange Format), a new open format that will insure that various archiving systems are interoperable with each other. SMPTE is expected to standardize AXF later this summer. About 50 companies have been participating in the development of the new standard.

Front Porch Digital’s software can interact with any of archival storage media, said Campanotti. The most frequently used archiving format for media and entertainment, he said, is magnetic tape — both LTO 6 tape cartridges and a more expensive enterprise grade system from Oracle called StorageTek T10000D cartridge technology.

“These (magnetic tape) systems are massive, but have a low-cost of ownership. The discs are not spinning all the time. No cooling is required. The data tape sits inside a robot and doesn't consume any power. It’s not susceptible to viruses or malware attacks,” said Campanotti “Tapes are pretty close to being as fast as (hard) disks. What takes time is the physical movement of the robot to put the tape into the drive. From data tape, it might take 30-seconds to a minute to go from archive to online.”

“If I have important data, I would try storing it in several kinds of media — DNA, stainless steel, whatever, and hope that one of them survives" aid Brian Campanotti.

The difference between LTO and T10000D drives are reliability, speed, performance and capacity. LTOs are not enterprise grade, while T10000D cartridges are. An LTO cart, with a life of 15 to 30 years, holds up to 6.25 terabytes of data with a transfer rate up to 400MB/sec compressed. A single T10000D cart, with a life expectancy of 30 years, holds 8.5 terabytes with a maximum compressed data rate of up to 800 MB/sec. The T10000D is twice as fast as LTO technology.

Hard disks are also part of the archiving solution, but are rarely used for the entire system. The Clipper Group, a computer analyst, found in a study that hard disks cost on an average 26 times more than tape-based archival systems. Their study found the cost of energy alone for the average disk-based system exceeds the entire cost of operation for tape systems. Disks also take four times the real estate space.

Even more expensive is flash media. While some media and entertainment customers are experimenting with flash, its more suitable for small transactional functions than large media storage. “For media and entertainment, where the files are much bigger, flash doesn't offer as much benefit in terms of storage,” Campanotti said. “Disks are just as fast for large files.”

To simplify archival storage for end users, Front Porch Digital launched LYNX, it’s first cloud-based archival service in 2012. LYNX is an integrated cloud storage solution for managing assets on a global scale from any device and any networked location. It can be used as a supplement to existing DIVArchive customers or as a standalone subscription service for customers who want to avoid owning any archival equipment.

Essentially, Front Porch Digital created a huge DIVArchive system and opened data centers in Denver, Toronto, Madrid and Geneva. It will open another in the Asia-Pacific region later this year.

New media for storage is coming from a variety sources, both major industry players and small start-ups.

Sony is selling its ODA, for its optical disc archive, which is based on Blu-ray technology. Cartridge sizes range from 300GB to 1.5 TB. Sony claims a 50 year shelf life, compared to the 15 to 30 years for a cart like LTO. Basic Sony drives work with both Macs and PCs.

Cuneiform Technologies of Dover, Delaware is demonstrating a new storage medium based on stainless steel roll-film estimated to last 1,200 years. The company has been seeking funding to develop the read/write technology and guarantee the reading of any media for 500 years.

Finally, the European Molecular Biology Laboratory (EMBL) and the European Bioinformatic Institute (EBI) are doing promising research with DNA to store data. DNA data lasts for tens of thousands of years. The researchers claim they can store at least 100 million hours of high-definition video in a tiny cup of DNA.

Storing Data With DNA

Reading data from DNA is fairly straightforward, using standard equipment, but writing it has until now been a major hurdle. Current methods make it only possible to manufacture DNA in short strings. Both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.

Nick Goldman and Ewan Birney of EMBL-EBI set out to create a code that overcomes both problems. They broke up the code into lots of overlapping fragments going in two directions, with indexing information showing where each fragment belongs in the overall code. They created a coding scheme that didn’t allow repeats.

They then asked Agilent Technologies, Inc, a California-based company, to encode an MP3 audio file of Martin Luther King’s “I Have a Dream” speech; a JPEG photo of EMBL-EBI; a PDF of Watson and Crick’s seminal paper, “Molecular structure of nucleic acids”; and a TXT file of all of Shakespeare's sonnets. They added a file that described the encoding.

Nick Goldman of EMBL-EBI looks at synthesised DNA

The result looked like a tiny piece of dust,” said Emily Leproust of Agilent. Agilent mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors.

“We’ve created a code that's error tolerant using a molecular form we know will last in the right conditions for 10,000 years or possibly longer,” said Goldman. “As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA.”

But before you get your hopes up that DNA or another media will allow you to archive data forever, Campanotti offers a bit of advice from his long experience in archiving.

“I don't think there will ever be a solution where you can store data and go back thousands of years to recover that data,” said Campanotti, who noted technology never stands still. “If I have important data, I would try storing it in several kinds of media — DNA, stainless steel, whatever, and hope that one of them survives".

“Technology comes in a five to ten-year cycles. It is improving that fast. For my active archive, I would move my data every five or ten years to the latest technology. Why wouldn't I?”

You might also like...

Future Technologies: Timing Asynchronous Infrastructures

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with a technical discussion of why future IP infrastructures may well take a more fluid approach to timing planes.

Standards: Part 13 - Exploring MPEG4-Part 10 - H.264/AVC

The H.264/AVC codec has been very successful. Here we dig deeper into how profiles and levels work to facilitate deployment of delivery systems and receiving client-player designs.

The Meaning Of Metadata

Metadata is increasingly used to automate media management, from creation and acquisition to increasingly granular delivery channels and everything in-between. There’s nothing much new about metadata—it predated digital media by decades—but it is poised to become pivotal in …

Designing IP Broadcast Systems: Remote Control

Why mixing video and audio UDP/IP streams alongside time sensitive TCP/IP flows can cause many challenges for remote control applications such as a camera OCP, as the switches may be configured to prioritize the UDP feeds, or vice…

Future Technologies: Autoscaling Infrastructures

We continue our series considering technologies of the near future and how they might transform how we think about broadcast, with a discussion of the concepts, possibilities and constraints of autoscaling IP based infrastructures.