Editor’s note: This article is the third in a series examining issues related to evaluating and implementing big data analytics in business.
There are numerous aspects of big data techniques and technologies, each with their own technical appeal: hardware analytical appliances, software tools that employ commodity hardware, NoSQL and graph-based data management systems, analytics tools, as well as the many components and programming frameworks encompassed within Hadoop. And while market conditions have enabled individuals within an organization to “test drive” a combination of these techniques, these new technologies need to win adoption in a broader enterprise setting to deliver value.
Given this need, managers need to answer two essential questions:
First: What is the process for piloting technologies to determine their feasibility and business value, and engaging business sponsors and socializing the benefits of a selected technique?
And second: What must happen to bring big data analytics into organization’s system development lifecycle, to enable their use?
The Culture Clash Challenge
These questions become much more relevant when you realize that the expectations for usability of big data clash with the traditional approaches for data collection and aggregation in the data warehouse. On the one hand, enterprises have invested much time and resources in developing an architecture for querying, reporting, and analysis. On the other hand, though, users are increasingly dissatisfied with the dimensional shackles that have become the foundation of the data warehouse. One recurring concern of the data scientist is that the information technology practitioners who have organized the data in a data warehouse to support rapid report generation and fast query responses may have filtered out potentially interesting bits of information.
So while data warehouses are good for streaming data into spreadsheets, they are limited when it comes to undirected data discovery—creating demand from analysts for “access to the raw data” in a big data environment instead of funneling requests through a business intelligence team assigned to query the data warehouse.
This dynamic reveals a tension between the traditional data warehouse approach and the rising demand for new analytical tools that enable more and different queries from more users than before. Within that dynamic, organizations need to find a way to evaluate whether the new technologies are valuable and how they can be incorporated into the information management structure and effectively deployed. The complexity of the challenge becomes more apparent once you consider some aspects of adoption of the big data techniques:
- Steep Learning Curve: While some of the technology is basically “plug and play,” much involves a steep learning curve. It is easy to download open source software for MapReduce or a graph-based database system. But it is much harder to develop applications that use these platforms unless the developer has some experience in high performance parallel code development and data distribution.
- Data Lifecycle Changes: The data lifecycle demands for big data analytics differ from data systems supporting traditional transaction processing as well as data warehouses and data marts that typically deliver results based on static structured data sets. A prime example is the desire to stream live data directly into a big data analytical application for real-time integration, while data warehouses are often just populated with static data sets extracted from existing front-end systems.
- Existing Infrastructure: A decade (or more) of investment in the traditional data warehouse and business intelligence framework has institutionalized certain approaches to data management. Yet the decision about the existing infrastructure (such as the traditional approaches to extracting data from sources to load into the warehouse as opposed to newer approaches to federation and virtualization that allow the data to remain in its original source) impact access to synchronized datasets as well as usability of a variety of data sources that are expected to feed big data analytics.
- Data intent: Most data instances are created for specific purposes, but big data applications seek to repurpose data for analysis. The original data intent may differ drastically from an array of potential uses, and this implies the need for greater governance for data control, quality, and semantic consistency.
- Size and duration: The desire to acquire and use massive datasets has direct implications for the “pure” aspects of data management. The transitory characteristics associated with rapid turnaround of numerous data streams conflict with the desire to retain very large datasets in anticipation of the potential for new analyses to be performed in the future. This tension will force enterprises to make investment and capital acquisition decisions to support data persistence and retention.
Involving the Right Decision-Makers
Given these challenges, how can organizations plan to support big data? More to the point: Who in the organization needs to be involved in the process of acquiring, proving, and then deploying big data solutions, and what are their roles and responsibilities?
In any technology adoption cycle, it is incumbent upon the key stakeholders in the organization to make sure that the business process owners, the information consumers, the technical infrastructure innovators, the application developers and the enterprise architects all work together in an environment that can continue to satisfy existing reporting needs yet is flexible enough for exploratory work.
We can look at a general sequence of tasks (see Figure 1 below) to help us consider how to manage the transition into a production development process to take advantage of the business value big data techniques can provide. The sequence starts with recognizing this opportunity, then defining expectations, and piloting, vetting and assessing big data technology before moving into production.
Figure 1: Managing the Process for Big Data Analytics Adoption
Executing this sequence in alignment with organizational needs requires people who can champion new technologies while also retaining a critical eye to differentiate between hype and reality. Below are some roles for organizational alignment played during the consideration, evaluation, and decisioning process for assessing the value proposition for big data:
- Business Evangelist. This individual understands the types of performance barriers imposed by the existing technical infrastructure and understands that ongoing reviews of emerging technology may create efficiencies that do not currently exist within the organization. The job of the business evangelist is to socialize the value of exploring the use of new techniques among the business process owners and solicit their input to understand their current and future needs to guide the selection of technologies to review and possibly pilot.
- Technical Evangelist. The technical evangelist understands the emerging technology and the science behind new methods, where the technology can potentially improve the application environment, either by improving the performance of existing processes or by enabling new capabilities.
- Business Analyst. This is the person who engages the business process owners and solicits their needs and expectations. This process identifies some of the key quantifiable measures for evaluating the business benefits of the new technology, as well as frames the technical requirements for any pilot project.
- Big Data Application Architect. While the vendor community suggests that these new programming frameworks simplify the process of developing applications for big data platforms, any solution that is designed without a firm background in parallel and distributed computing is bound to be sensitive to fundamental flaws that will impede the optimal performance. Make sure that any pilot development is designed by an application architect with reasonable experience in performance computing.
- Application Developer. Identify the technical resources with the right set of skills for programming and testing parallel and distributed applications.
- Program Manager. Lastly, and perhaps most importantly, engage a program manager with project management expertise to plan and oversee any pilot development to make sure that it remains aligned with organizational expectations, remains within an allocated budget, and is properly documented to ensure that the best practices can be captured and migrated into production.
Each of these roles fills an essential and complementary purpose. The evangelist roles are critical in establishing recognition of the value proposition. The business and technical evangelists must work with the program manager and the architects in mapping out a strategy for engaging the business users, understanding how the new techniques can deliver greater value in a more efficient or rapid manner, as well as agree to the success measures and criteria for the inevitable go/no-go decision. The business analysts must communicate their visibility into the business challenges to the application developers, and both must be governed by the program manager to reduce the risk of wasted investment in technology development purely for technology’s sake.
Having a well-planned and managed program for technology evaluation in the context of well-defined business needs will simplify organizational alignment: it casts all speculative technical development in the context of demonstrating (or not) the creation of business value. Soliciting clear business requirements and specifying corroborating success criteria and measures enables a level of trust that test-driving new technology like big data analytics is not purely an intellectual exercise, but rather is a repeatable process for injecting new ideas into the corporation.
David Loshin is the author of several books, including Practitioner’s Guide to Data Quality Improvement and the second edition of Business Intelligence—The Savvy Manager’s Guide. As president of Knowledge Integrity Inc., he consults with organizations in the areas of data governance, data quality, master data management and business intelligence. Email him at email@example.com.