I have a confession to make. Anytime I hear the term “big data” I simply zone out and hear the sound of Charlie Brown’s teacher (“Wah wah wah wah…”).
The buzz around big data has reached what I hope to be its fever-pitch after quite a buildup. According to Google Trends, we are at the high-point of searches associated with the term “big data” and we have been for a few months. It’s not clear how long this will last. And although theories differ on this, I believe that February, 2010 was the tipping point. The Economist published a feature called “The Data Deluge,” and according to Google Trends just nine short months later the increase in searches for big data began its meteoric rise. The most notable occurrence related to big data in November 2010 was the first acquisition that specifically called out big data as the reason, when EMC bought Isilon Systems, a maker of network-attached storage systems.
“Big Data is a term used to describe the massive amount of data produced by a new generation of applications in markets such as life sciences (e.g. gene sequencing), media and entertainment (e.g. online streaming), and oil and gas (e.g. seismic interpretation) to name a few, the companies’ press release noted at the time. This is where they hype cycle all began.
Lack of Clarity Impedes Progress
While the press release noted some interesting use cases, what is missing from such statements, and from the broader conversation about the term “big data,” is clarity. A big part of the issue is the fact that there is not a universally accepted definition. The references to Volume, Variety and Velocity aside, there is not a standard definition that allows for the inevitable change implied in the word “big” or what you’re supposed to do with all of that big data. Recently I have read proposed definitions that address the elusive change associated with big data. That is a good start if we ever want to really take advantage of what big data is or could be to an organization.
The reality is that big data has always existed; we just didn’t have a term for it. It was all the data we had that we didn’t know what to do with. Yes, it feels like we all now produce a zettabyte of data every time we breathe, but why should that matter? What’s different now? These are the questions that I ask myself when a client says to me, “How do we use big data?”
Debunking Big Data
Here’s the truth behind taking real advantage of big data. The barriers to success are significant. They cannot be fixed by a product, tool or system alone. In other words, you can’t buy your way into big data value. In order to take real advantage of big data you have to invest in the people and the processes associated with the data, big or small.
Big data has big problems associated with it. The size of the data, the speed at which it comes in and the variety of data we are dealing with presents a set of issues unique to big data (versus “regular” data, hence the hype). The management and, more specifically, the governance associated with these data require solid processes to profile the data and define the data and that just represents the level of effort to get started. Then you have to apply business rules (and ETL, extract, transform and load processes) and store the data in a way that promotes usage. These last two activities can be accelerated by a product or system.
What About Value?
Even before getting started, though, there is an important step that you have to take first: you have to determine the value associated with these data. I know that everyone is jumping on the big data train, and it’s tempting to be part of the “in-crowd” but as your mother said, “If someone asked you to jump off a bridge, would you?” Said another way: If someone asked you to invest a large capital budget because “everyone else is doing it” would you? I hope the answer is “of course not!”
The investment with big data should start with a serious look at your data, with proposed uses tied to specific value. Then it should include a people and process investment to align value to the organizations strategic focus.
Then (and only then) should you consider a capital investment in a product. Examples of case studies of organizations that truly used big data in the way that it was intended clearly demonstrate that in each case, the organization had invested in the people and process required to take true advantage of data and had a very specific use case associated with the data.
When it comes to data, big and small, there is no easy button. We still have to stay heads down and focus on the hard stuff first. To take full advantage of the data, careful planning should be taken to ensure that the data is clean, consistent, well-defined and contextually appropriate. When that milestone has been reached, the products associated with big data can help you reach past the hype and achieve value for your organization.
Laura Madsen, the leader for the healthcare practice at Lancet Software, a BI consulting firm in Minneapolis, is the author of Healthcare Business Intelligence: A Guide to Empowering Successful Data Reporting and Analytics.