In popularizing the concept of big data, the analyst community and the media have described it in terms that span some collection of Vs—Volume, Velocity, Variety, as well as other suggested V-words such as Value and Variability. However, it is necessary to look beyond what is effectively a marketing definition to understand the concept’s core intent. Big data is fundamentally about solving business problems whose resource requirements (for data management space, computation resources, or immediate, in-memory representation needs) exceed the capabilities of traditional computing environments as currently configured within the enterprise.
But to best understand what “big data” truly can mean to your organization, it is worth considering the market conditions that have enabled its apparently growing acceptance as a viable option to supplement the intertwining of operational and analytical business applications in light of exploding data volumes. This is the first in a planned series of articles designed to:
- Characterize what is meant by “massive” data volumes,
- Review the relationship between the speed of data creation and delivery and the integration of analytics into real-time business processes,
- Explore reasons that the traditional data management framework cannot deal with growing data variability, and
- Qualify the quantifiable measures of value to the business.
That last point is probably the most important, especially when the forward-looking stakeholders in an organization need to effectively communicate the business value of embracing big data platforms, and correspondingly, big data analytics. For example, a business justification might show how incorporating a new analytics framework can be a competitive differentiator. Companies that develop customer up-selling profiles based on limited data sampling face a disadvantage when compared to enterprises that create comprehensive customer models encompassing all the data about the customer intended to increase revenues while enhancing the customer experience.
The Quest for Business Agility
The user demand for insight that is driven by ever-increasing data volumes must be understood in the context of organizational business drivers to help your organization appropriately adopt a coherent information strategy as a prelude to deploying big data technology. Corporate business drivers may vary by industry as well as by company, but reviewing some existing trends for data creation, use, sharing, and the demand for analysis may reveal how evolving market conditions bring us to a point where adoption of big data can become a reality.
Business drivers are about agility in utilization and analysis of collections of datasets and streams to create value: increase revenues, decrease costs, improve the customer experience, reduce risks, and increase productivity. The data explosion bumps up against the requirement for capturing, managing, and analyzing information. Some key trends that drive the need for big data platforms include:
Increased data volumes being captured and stored. According to the 2011 IDC Digital Universe Study, “In 2011, the amount of information created and replicated will surpass 1.8 zettabytes, … growing by a factor of 9 in just five years.” The scale of this growth surpasses the reasonable capacity of traditional relational database management systems, or even typical hardware configurations supporting file-based data access.
Increased data volumes pushed into the network. According to Cisco’s annual Visual Networking Index Forecast, by 2016, annual global IP traffic is forecast to be 1.3 zettabytes. This increase in network traffic is attributed to the increasing number of smartphones, tablets and other Internet-ready devices, the growing community of Internet users, the increased Internet bandwidth and speed offered by telecommunications carriers, and the proliferation of Wi-Fi availability and connectivity. More data being funneled into wider communication channels creates pressure for capturing and managing that data in a timely and coherent manner.
Growing variation in types of data assets for analysis. As opposed to the more traditional methods for capturing and organizing structured data sets, data scientists seek to take advantage of unstructured data accessed or acquired from a wide variety of sources. Some of these sources may reflect minimal elements of structure (such as Web activity logs or call detail records), while others are completely unstructured or even limited to specific formats (such as social media data that merges text, images, audio, and video content). To extract usable signal out of this noise, enterprises must enhance their existing structured data management approaches to accommodate semantic text and content-stream analytics.
Alternate and unsynchronized methods for facilitating data delivery. In a structured environment, there are clear delineations of the discrete tasks for data acquisition or exchange, such as bulk file transfers via tape and disk storage systems, or via file transfer protocol over the Internet. Today, data publication and exchange is full of unpredictable peaks and valleys, with data coming from a broad spectrum of connected sources such as websites, transaction processing systems, and even “open data” feeds and streams from government sources and social media networks like Twitter. This creates new pressures for rapid acquisition, absorption, and analysis while retaining currency and consistency across the different data sets.
Rising demand for real-time integration of analytical results. There are more people—with an expanding variety of roles—who are consumers of analytical results. The growth is especially noticeable in companies where end-to-end business processes are augmented to fully integrate analytical models to optimize performance.
As an example, a retail company can monitor real-time sales of tens of thousands of SKUs at hundreds of retail locations, and log minute-by-minute sales trends. Delivering these massive data sets to a community of different business users for simultaneous analyses gives new insight and capabilities that never existed in the past: it allows buyers to review purchasing patterns to make more precise decisions regarding product catalog, product specialists consider alternate means of bundling items together, inventory professionals to allocate shelf space more efficiently at the warehouse, pricing experts to instantaneously adjust prices at different retail locations directly at the shelf, among other uses. The most effective uses of intelligence demand that analytical systems must process, analyze, and deliver results within a defined time window.
Technology Trends Lowering Barriers to Entry
Enabling business process owners to take advantage of analytics in many new and innovative ways has always appeared to be out of reach for most companies. And the expanding universe of created information has dangled broad-scale analytics capabilities beyond the reach of everyone but the largest corporations. What makes the big data concept so engaging is that emerging technologies enable a broad-scale analytics capability with a relatively low barrier to entry. Facets of technology for business intelligence and analytics have evolved to a point at which a wide spectrum of businesses can deploy capabilities that in the past were limited to the largest firms with equally large budgets. Consider these four aspects:
- Application development: A simplified application execution model encompassing a distributed file system, application programming model, distributed database, and program scheduling is packaged within Hadoop, an open source framework for reliable, scalable, distributed and parallel computing.
- Commoditized platform: Innovative methods of creating scalable and yet elastic virtualized platforms take advantage of clusters of commodity hardware components (either cycle harvesting from local resources or through cloud-based utility computing services) coupled with open source tools and technology.
- Big data management: Alternate models for data management (often referred to as NoSQL, or “Not Only SQL”) provide a variety of methods for managing information to best suit specific business process needs, such as in-memory data management (for rapid access), columnar layouts to speed query response, and graph databases (for social network analytics).
- Utility Computing: The ability to deploy systems like Hadoop on virtualized platforms allows small and medium businesses to utilize cloud-based environments that, from both a cost accounting and a practical perspective, are much friendlier to the bottom line.
The business drivers make big data analytics attractive to all types of organizations, while the market conditions make it practical. This is not to say that implementing these technologies and business processes is a completely straightforward task. There is a steep learning curve for developing big data applications, especially when going the open source route, which demands an investment in time and resources to ensure the big data analytics and computing platform is ready for production.
That means that another question remains: is it reasonable? In other words, when evaluating the feasibility of adopting big data technologies, have you considered whether your organization faces business challenges whose resource requirements exceed the capability of the existing environment? In our next article, we will explore the types of business problems that are suited to a big data solution.
David Loshin is the author of several books, including Practitioner’s Guide to Data Quality Improvement and the upcoming second edition of Business Intelligence—The Savvy Manager’s Guide. As president of Knowledge Integrity Inc., he consults with organizations in the areas of data governance, data quality, master data management and business intelligence. Email him at firstname.lastname@example.org.