James Glieck notes, “When information is cheap, attention becomes expensive.” This is becoming more and more true as we transition into a world that collects an unprecedented amount of information but yet can’t capture all it has to offer.
The “big” portion of big data implies that it is all about data collection. But today, collection is easy – and prolific. And without turning that big data into useful data, the collection is nothing more than a stockpile. This is where companies that are embracing things like business intelligence and data mining are still struggling to find meaning.
The flood of data highlights the dominating role that information has in our business and our personal lives. Organizations must accept that they now are in the business of information. They turn to analytics to make important business decisions and find their competitive advantage. And it doesn’t stop there. The central role of data in business will give rise to an entirely new market of companies, whose only job is to collect information and sell its insights.
Big data processes have created a new set of challenges and, in the market, valuable tools are rising to address them. And the emphasis will be less on volume and more on value. Here are the top five big data challenges that organizations will face during the rest of this year:
Complete and Comprehensive
Psychology presents us with the concept of standard cognitive biases. These biases impact every decision we make. Examples include anchoring and availability heuristics, which surface regularly in face-to-face meetings. Decisions made during the meetings are highly dependent on the most memorable recent meeting or conversation, even if it were just a conversation in the hall one hour earlier. This phenomenon can carry into data insights as well. If the dataset is not inclusive of all relevant data, if the data is not timely or is polarized to a particular base point of view, the insights will be slanted. An example in marketing is to mistake higher user activity in one product over another for interest level or success – it might just be the nature of the user for which that product was designed. Or in IT, measuring the response time of servers on a one-by-one basis and determining that on-premise infrastructure performs better than cloud. In this case, the cloud servers could be intentionally small and set up in a large, distributed grid for faster scaling. In both cases, the metric and the type of data collected lead to misleading results.
To avoid this, organizations have to embrace some strategies around their data collection. You can’t say, “Collect what is easily available.” You have to acknowledge dependencies and collect data that might be harder to obtain as well. What are the interrelationships between data? Do you have the entire matrix of data? What unfair biases might you be introducing into this data?
As new technologies arise, it is very easy just to jump in. And we all hope that we can get past the red tape with the growth of new tools and technology. But as long as there is litigation, IP risk, and competition, information governance needs to be considered. For most organizations, information governance is well established for things like documents and even raw data. But what about the insights? The dashboards and insights gathered from big data processes are discoverable and just as risky as any other content the organization owns – especially when that data could be used in a high-profile decision. In addition to security, organizations forget who can see what. Even though raw data might be secure, potentially sensitive information could be exposed in a nice “public” visual. Consider the governance of big data insights as a separate piece of content. Security trimming at the data level often does not relate to the security of data when it’s assimilated into dashboards and insights.
Stream of Data
Big data is continuous. This means the analysis also is continuous and creates continuity between data that was never possible before. However, this also creates a challenge when it comes time to report on the data – especially for organizations that embrace agile technology practices but still rely on waterfall business processes. The continuous insights do not fit nicely into existing business process. Organizations will need to establish a new way to view information and methods for evaluating it that are different than slides in a management meeting.
The Business of Data
From the massive consumer-facing web application to the government-appointed organization that builds and manages power lines, all organizations are in the business of data. However, when this is not accepted by top management, big data gets rolled strangely into two separate practices of IT and some business analysis function. This also means that big data usually does not get the attention it deserves in terms of implementation, but this doesn’t diminish the demand for results. This contradiction frustrates data scientists and makes it hard to make the move into data insights. Organizations will need to realize that they are in the business of information and transform the organization to fit that reality.
Riding Coat Tails
Organizations do not like to be guinea pigs, and when it comes to brand new, barely understood technology, it is easy to turn to those who already have seen value. The belief is that big data is the same for all companies, perhaps varying only by industry. At the same time, very few organizations will profess that their business or activities are the same as other organizations; they believe they are unique. This creates the contradiction of “unique” companies seeking standard implementations. There is no one-size-fits-all for big data, if for one simple fact: Everyone is asking the same questions, but in different ways. This issue gets amplified when you realize that big data is perceived in any of three ways: as KPIs only, as data cubes, or as data mining. From the beginning of this spectrum to the end, the complexity variation is huge. And thus, when comparing one big data solution to another, you might not be discussing the same approach or desired goal. It is better for organizations to learn the general big data concepts and then apply those concepts to real business problems.
“Information is not knowledge, and knowledge is not wisdom.” – James Gleick
How we get to real value from data is not solely in the tools that collect and visualize. The tools themselves are a conduit. They are not the answer. However, if you start with some clear objectives and a strategy, modern-day services provide cutting-edge tools for real-time anomaly detection, steady state analysis, clustering, and dynamic visualizations.
Trevor Parsons, Ph.D., is co-founder and Chief Scientist at Logentries.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.