Today’s progressive business leaders are demanding their companies integrate their existing enterprise resource planning (ERP) and customer relationship management (CRM) systems with external sources, such as social media and machine sensor data, all while tapping into cloud computing systems to deliver actionable insights quickly and in a cost-effective manner.
It’s clear why: In this age of big data, new technologies and methods are evolving quickly and providing organizations new options to acquire, store, organize, and analyze large volumes of diverse data. The opportunity is here to manage data in a way that reduces the overall cost associated with a traditional data warehouse, while aggressively exploiting new analytical capabilities. Advanced analytics (analytics that provide for the forecasting of future events and behaviors, allowing businesses to conduct what-if analyses to predict the effects of potential changes in business strategies) can turn poor business decisions, made using haphazard guesswork, into well thought out and successful business decisions that improve operational efficiencies, drive revenue, and provide an competitive advantage in the market.
However, while the opportunity is ripe, an undesirable side effect of acquiring massive (and growing) amounts of new data is the difficulty of managing that growth. For some organizations, analysis that used to take minutes now takes hours to complete, assuming that one can receive results at all. The sheer volume of data (much of which is unstructured) that some organizations have amassed has created a situation where organizations require new and more effective methods to identify relevant data and to extract actionable insights out of the most pertinent information possible.
While big data and supporting technologies are in their relative infancy, they are evolving and providing organizations the opportunity to begin data optimization initiatives that use technologies like Hadoop to offload data storage to less expensive commodity hardware, while also increasing the ability to learn and discover new insights without boundaries. Advanced analytics is a natural outgrowth of that optimization.
A Slimmed-Down Data Warehouse, in the Cloud
In advanced analytics, tools and modeling are set in a more meaningful context and operational analytics are derived from a trimmed-down data warehouse which contains the right data needed for the core business. Ultimately, this model provides an opportunity to better leverage the optimized environment to support improved analytics and influence organizational change.
How does optimized data create opportunistic paradigms for an organization? Cloud computing is one way. Companies can leverage the cloud for targeted analytics (for example, a sandbox environment for solving the problem of cleansing a customer or product master dataset). An activity such as this requires unloading a very large dataset, running analytics to identify the “real” customer or product prior to flushing the existing customer or product data, and integrating the new datasets. Obviously, there are greater complexities in such an initiative. But leveraging the cloud to deliver this type of capability is another example of effectively leveraging an emerging technology to provide advanced analytics.
Another way to create optimizations and efficiencies is to put the data warehouse on a diet. The old rule of thumb – which is still likely true – is that organizations only actively leverage 5 percent of the data in the data warehouse. Data warehouse designers did and still do “land grabs” on system data because predicting what a user needs in the next six months is impossible and re-architecting a data warehouse to include the data once its known to be required is cost prohibitive. So, why not grab it all?
Instead, deploying Hadoop as a building block of your advanced analytics platform provides an opportunity for optimizing operational aspects of the traditional data warehouse and enhancing its overall value. This model makes efficient use of large volumes of data to rapidly produce insight where traditional analytics environments would fail.
Integrating Hadoop with Business in Mind
Integrating a Hadoop framework into an organizations infrastructure is no different than integrating other technologies. There needs to be a well-considered approach to what questions Hadoop will answer, how ready the organization is for Hadoop, and how the data will be managed.
The first question to ask is not, “How will I use Hadoop?” The first question is, “Do I understand enough about Hadoop and do I have problems (real business use cases) that Hadoop will help me address?” A few use cases that come to mind are as follows:
• “I need a method of optimizing my infrastructure and infrastructure spend for collecting and managing very large quantities and disparate types of data.”
• “I need an archive solution to trim down the sheer volume of data in my data warehouse so that I can run critical analytics in a timely manner.”
• “I need a sandbox platform where my marketing analysts can develop data models from a broad range of data.”
Other important considerations include a fundamental understanding of:
• Who will own this?
• How will my data governance model manage unconstrained data growth?
• How will I know what quality, reliable data looks like?
• How will I integrate it into my existing analytics capabilities?
The data explosion is happening. New technologies and methods are here, and they are good enough to make a positive difference in your competitive position—if you choose to take advantage of them now.
Scott H. Schlesinger is senior vice president, head of business information management at Capgemini of North America. Email him at firstname.lastname@example.org. Follow him on Twitter: @sschlesinger922. Dorman Bazzell is enterprise analytics practice leader at Capgemini. Email him at email@example.com.
Home page image via ThinkStock.