With all the attention companies pay today to collecting more data to analyze, not enough business and IT leaders are considering the critical issues of backup and disaster recovery and the big data question: What if it breaks?
Any time a company loses data, it creates difficulties. But the stakes are higher when big data is involved simply because of the sheer volume, variety and velocity of it. If a natural disaster strikes or your data is corrupted and you lose that information, the cost to your business could prove devastating. If a cybercriminal should breach your defenses, it also could trigger a calamity to your revenues—and your reputation.
Of course, if your online Web provider suddenly goes down for a day, what would that cost you? Amazon is estimated to have lost about $30,000 a minute when its site went down for two hours in June 2008, according to media reports at the time. Just imagine the loss today, when Amazon’s revenues (more than $61 billion at the end of 2012) are three times what they were at the end of 2008.
These stakes are reason enough for IT chiefs to re-evaluate their companies’ data-recovery plans to address any loss of big data. Here are six points to consider when evaluating or establishing data-recovery policies and procedures that cover big data.
1. Each situation is different. Remember that each disaster recovery plan is company- and industry-specific. You must consider your unique needs and business requirements. You should use the proven Business Impact Analysis (BIA) tools to assess the impact of losing your big data application and data. For some companies, big data is a mission critical application that requires high levels of uptime and data retention (such as a large bank’s fraud detection application). For others, their big data app doesn’t require that level of uptime (for example, a retailer’s consumer sentiment application on Twitter and Facebook).
2. No overhaul required. Follow the general principles of data recovery and don’t abandon your existing framework for business continuity and information lifecycle governance. For some businesses, big data will prove a business-critical issue but, for others, it won’t be as crucial. Determine the proprietary nature—your intellectual property—of that data. And just as your standard disaster-recovery process involves understanding the impact of losing data on your business, big data requires the same levels of specificity.
3. Use the 80/20 Rule. Admit it. Not all those petabytes of data are that critical. So, if your big data is breached, what do you want to make sure stands the best chance of being secure. (It’s costly to “insure” so much data.) This is where you can apply the 80/20 rule, which says that 20 percent of your data accounts for 80 percent of the value. Sure, it won’t always be the best measure since even the loss of a small amount of that data could have an enormous effect, say, on the output of an analytical process or a company’s reputation. If you use the familiar rule, you must determine the 20 percent of data that’s crucial to protect.
4. Consider the value window. IT leaders must gauge how long the big data they’re storing will be of value. It depends, of course, on what you’re gathering and how you’re using it. If it’s weather information, then perhaps you retain data for years. If it’s social media data that tracked one event, you probably don’t need to retain much but the essentials.
5. Choose the right medium. What format and media will you use to store your data? Disk or tape? Cloud or on-premise? Raw or re-duplicated? Key for this choice is file restore speeds. The least expensive method is offsite, on tape, and de-duplicated. Of course, you pay the tax of having to wait for days to restore your data. Can you wait that long?
6. Choose the right format. Is the data cleansed or raw? Summarized? Aggregated or non-aggregated? Here is a simple example that shows the power of summarizing: if you gather one sample of data for every second for one year, you have more than 31 million records. Simply summarizing that data into hourly averages, min/max, and standard deviation gives you a 93 percent reduction in records. Doing the same for each day would yield a 99.9 percent reduction in records. Do you really need all of that detail?
If this seems overwhelming, it’s understandable. But don’t fret. Big Data recovery will be one of the major IT topics ahead. It’s inevitable given the gargantuan databases more businesses are mining to grow their revenues and bottom line. Look for seminars, dialogue about best practices, rules of thumb and that 80/20 rule.
Michael de la Torre is vice president of Product Management at SunGard Availability Services, a company that developed disaster-recovery services for banks and other companies in 1978.