While the data that organizations need to manage is getting bigger and more complex, its ephemeral nature has not changed. Data’s usefulness wanes, and eventually it is discarded or is archived on media that decays over time, or its storage format becomes obsolete and unreadable.
This process has been the same since the days of punch cards. A range of computer scientists, chemists, and materials experts are thinking big and long about this problem and have come up with ideas that could help our analytics descendants a century, or a million or more years from now.
Most companies simply plan for data to age-out and be discarded, but a few organizations do not have that option. Universities and research facilities, for example, keep research data for decades or longer. Governments collect economic and demographic data that needs to stay accessible essentially forever—same with satellite data. Medical records are sometimes stored 75 years or longer.
If you happen to be responsible for ensuring the viability of data for all eternity, your options are limited. Today, the standard process for keeping digital data viable and accessible is to move and reformat to more modern media and systems every three to 10 years. This is an expensive, time-consuming process, and consequently organizations often wait too long. As a result, some or all of the data becomes unrecoverable. As organizations continue to increase the rate at which they collect data, long-term storage maintenance and viability will become a bigger issue.
Requirements for very long-term storage are tough. The recording technology can’t depend on currently existing reading hardware to be available. The medium needs to be tough enough to tolerate a wide range of environmental changes. And the cost of the technology and resources needed to record the data needs to be low. The fewer number of storage media you need for your terabytes or petabytes of data, for example, the less cost and management overhead.
The good news: Several promising storage options are emerging for applications where the data needs to be accessible permanently—or reasonably close to it. All address both the physical decay of the media and the long-term readability of the recording format. The bad news: Only one of them is a commercial product.
A rundown of the long-view storage innovation pipeline:
- 100 Years: Digital Optical Technology System (DOTS), now on the market, is a metal alloy tape that optically records microscopic, human-readable images of documents, similar to microfiche but with much higher density. Data is read with a camera-based reader. If after 70 or 80 years that reader becomes lost or broken, the tape is readable with a microscope. The DOTS technology was originally developed by Kodak, which sold the technology to Group 47.
- 1 Million Years: 5D glass crystals, or “memory crystals” (think the Kryptonian crystals from Superman), hold data recorded via self-assembled nanostructures created with a laser in fused quartz. “5D” refers to five dimensions: the size, orientation, and three dimensional positions of the nanostructures. The 5D technology was developed at the University of Southampton in the United Kingdom.
- 10 Million Years: A sapphire hard disk made of two 8-inch sapphire disks fused together with a platinum layer between them holds images or text inscribed on the platinum in human-readable format. This technology is designed more for applications such as sending messages to future generations (hazardous site warnings, for example) than for traditional data storage. The sapphire disk was developed by a consortium of experts led by the French nuclear waste management agency ANDRA. A commercial version of this technology would be quite expensive: The prototype cost €25,000.
- 100 Million Years: A fused silica glass plate holds binary-encoded, optically readable data on a thin layer of quartz glass. It currently holds about 40MB of data per square inch. The developers of this technology from Hitachi, expect it to last forever unless the glass breaks. Hitachi unveiled this technology in 2012, but has not said when it might become commercially available.
Ten years ago, few in data management imagined the scale at which some organizations collect and analyze data today. If data scientists and analysts continue to discover new value in historical data, who knows what the long-term storage requirements might be 10 years from now?
Michael Nadeau is the publisher of Data Informed. Follow him on Twitter: @menadeau.