What do cave drawings, word-of-mouth, folklore, animal skin, wood and electrons all have in common? They’ve all been used to preserve information – information deemed essential for daily function of individuals.
Although recording and storing information is hardly new, there is one aspect of information being generated and stored today that demands examination – our extreme dependency upon that information. Data informs us (media, broadcast, education); it guides us – both literally (GPS) and figuratively (courts, compliance, law); it connects us (social media); it feeds us (agriculture innovation); it heals us (medical, genomic, biological); it expands our horizons (space, oceanic, atom particle); it moves us (transportation); it records us (video surveillance, photography, social media). I could point to virtually every aspect of our existence, but I think you get the point. And that is the point – digital information is how we perpetuate modernity and society itself. I refer to this information collectively as “society’s genome.”
One might argue that the ability of data to perform these varied functions, and our dependency on those functions, has led to an explosion of new technologies to store and protect this information into the future, and opened a panacea for storage vendors. But that argument would be hard to win. Instead of expanding, storage technology dedicated to long-term digital preservation has been in a constant state of contraction. “Consolidation” has been the battle cry of modern day storage, and it’s been well heeded – maybe too well. There were more than 70 manufacturers of tape drives in the 1990’s, today there are 3. The disk drive market is similar, from dozens of manufacturers to 3 (with only 2 of those in the enterprise storage area).
We can’t point solely to the manufacturers for this consolidation. “Put it in the cloud” has been a tagline to the consolidation battle cry. Not only are we seeing a consolidation to single source storage devices, we’re seeing a consolidation of geographic location in which this information lies. So what’s the problem?
Here’s another, and more alarming, aspect of digital information today – it’s under constant attack.
We’re engaged in conversations on storage with some of the largest data centers and data repositories in the world. From cloud providers to high performance computing centers to governmental agencies around the globe. One of the most common concerns I hear is that of protecting data against cyberattack. Some of these facilities are logging thousands of attempted attacks a day. It’s not hard to believe with studies quoting anywhere from 400,000 to over 1 million attacks (worldwide) on a daily basis.
When malware embedded in the firmware of just 2 disk drive manufacturers could take out 90% of all data stored online, we have a problem to consider. When an attack on a single cloud location could take out multiple organizations, we have a problem to consider. The more we consolidate, the more we find ourselves drinking from a communal cup of information. If one of us has a cold, we’re all going to catch it. If the virus is more serious than a cold, so are the consequences.
Some of these challenges are beyond our control. A great example was published recently by wired.com (https://www.wired.com/2016/08/quadroot-android-vulnerability-qualcomm/). A chipset manufactured by Qualcomm suffered from a set of vulnerabilities referred to collectively as, “QuadRooter.” These same chipsets are used by multiple manufacturers of Androids including LG, HTC, OnePlus and Google Nexus devices. It’s estimated that there are 900 million Android smartphones affected. If the device is compromised, it would give an outsider root access to your phone, photos, data, GPS coordinates of the phone, etc. The article compares it to, “giving someone the keys to your house, then holding the door open for them while they make off with the jewels.” Qualcomm has released patches for the vulnerability, but it’s extremely difficult to get those patches from the manufacturers (who’ve released multiple revisions and models of the phones) down to the end users.
Once those vulnerabilities are addressed, new ones will appear. And that’s where we all lack control – no matter how strong the lock, determined forces can figure a way to break in. But here’s where we do have control – assure we’re able to restore any data that’s destroyed or stolen.
Heed the new battle cry for storage – Diversify. That’s a technique we’ve seen used by nature for the last 5 billion years (give or take).
Nature has endured longer than any of the before mentioned methods of passing along information and may have a thing or two – or three – to teach us about preservation. While nature has also played a role in destruction – hurricanes, floods, earthquakes, fires – its greatest accomplishment has been the ability to assure organisms survive such threats; that their genomes are passed along.
What does nature know about cyberattack? Nothing. That’s where you come in. A translation of 3 basic principles, from nature to the data center, are all in your control and will give the greatest assurance possible that your organization’s genome will survive disasters of any type.
The importance of genetic diversity is well documented and easily observed in nature. By identifying the lack of genetic diversity in threatened species, biologists are now able to better determine which species are at a higher risk of extinction than by using previous methods such as habitat change. Interestingly, it’s genetic diversity that allows us to better adapt to environmental changes. Examining environmental changes in the storage world – cyberattack and consolidation – it’s important to examine how “genetic diversity” can be introduced by using multiple forms of storage. Disk, tape and even optical all have different strengths and weaknesses. We usually compare those attributes by cost and latency alone. In light of surviving cyberattacks, we see that an attack on disk would rarely be able to target offline tape and vice versa. This is an increasingly important way to consider the value of multiple forms of storage for important data.
Anyone who’s blown on a dandelion after its bloomed has experienced nature’s second key to long term survival – geographic dispersal. Plants produce millions of seeds that are well dispersed by wind, mammals or birds. Each seed has an infinitesimal chance at survival, yet plants thrive due to multiple “copies” geographically dispersed. Cloud storage can be considered one form of geographic dispersal, but there are caveats to make that hold true. It’s common for cloud storage to be very close to the data centers they support. When using the cloud for offsite storage, one must consider where that site is physically located. Even with an approach such as, “3 copies in the cloud” virtually all cloud storage is stored on disk, violating the prior law of genetic diversity. The same could be said for offsite replication from disk to disk. By deploying an in-house private cloud and putting archived copies in a remote public cloud or spinning a copy to tape that can be shipped away, the chance of data preservation is infinitely higher and improves restoration options as well.
The final mechanism of nature’s approach we point to is that of error correction. During cell reproduction, multiple proteins work together to perform an operation similar to that of a checksum in data centers. The integrity of genetic information is verified as cells reproduce and these proteins can either repair or destroy the cell if significant DNA damage is detected. While a checksum in the data center can be used to detect the malicious replacement of a file within an archive, its greatest advantage is to confirm the information we’ve worked so hard to protect is accurate. And if it’s not, the advantage of multiple copies is at hand.
Understanding nature’s approach to preserving an organism’s information – biological genomes – and applying it to preserving your organization’s information – society’s genome – might just give you a 5-billion-year track record as well… give or take.
Bob Cone is the Director of Sales Training and Product Enablement for deep storage technology vendor Spectra Logic. He is also the co-author of “Society’s Genome: Genetic Diversity’s Role in Digital Preservation.” Follow them @SpectraLogic.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.