Big data’s potential business impacts are extensive. For businesses in all industries, big data analysis can have far-reaching applications in customer engagement, operational efficiency, competitive response, and other critical areas. In fact, a recent global survey of 1,144 business and IT professionals shows 63 percent of respondents said they are gaining a competitive advantage by using big data and analytics for their organizations.
But in today’s high-stakes “need it now” business environment, traditional data analysis just doesn’t cut it. By “traditional,” I’m referring to production operational reporting on relatively stable datasets maintained in databases that were primarily optimized for transactional computing, not analytics. Traditional data analysis platforms are ill-suited for the agile requirements of today’s dynamic, always-on business operations. Built to aggregate and analyze transactional data sets on customers, financial, and other aspects of business operations, traditional transactional databases are often unable to handle the unstructured and other non-relational data sources essential to digital marketing and other key new initiatives. In addition, many traditional analytic databases are optimized only for batch data ingest, processing, and delivery, which make them non-starters in the age of real-time always-on business.
What enterprises truly need is a big-data platform that is optimized for agile analytics, which refers to the ability to collect, align, analyze, and deliver actionable intelligence in real time from a wide and ever-changing range of data sources. The ideal big-data platform must go beyond the “one size fits all” databases of the past, blending disparate “fit-for-purpose” technologies – relational database management systems (RDBMS), Hadoop, in-memory, NoSQL – to do justice to the dynamic diversity of modern business.
Above all, the big-data platform must have the agility to operate cost-effectively at any of the following:
• Any scale of business: Business operates at every scale from breathtakingly global to intensely personal. You should be able to acquire a low-volume data platform and modularly scale it out to any storage, processing, memory and I/O capacity you may need in the future. Your platform should elastically scale up and down as requirements oscillate. Your end-to-end infrastructure should also be able to incorporate platforms of diverse scales—petabyte, terabyte, gigabyte—with those platforms specialized to particular functions and all of them interoperating in a common fabric.
• Any speed of business: Business moves at crazy rhythms that oscillate between lightning fast and painfully slow. You should be able to acquire a low-velocity data platform and modularly accelerate it through incorporation of faster software, faster processors, faster disks, faster cache and more DRAM (dynamic random access memory) as your need for speed grows. You should be able to integrate your data platform with a stream computing platform for true real-time ingest, processing and delivery. And your platform should also support concurrent processing of diverse latencies, from batch to streaming, within a common fabric.
• Any scope of business: Business manages almost every type of human need, interaction and institution. You should be able to acquire a low-variety data platform—perhaps a RDBMS dedicated to marketing—and be able to evolve it as needs emerge into a multifunctional system of record supporting all business functions. Your data platform should have the agility to enable speedy inclusion of a growing variety of data types from diverse sources. It should have the flexibility to handle structured and unstructured data, as well as events, images, video, audio and streaming media with equal agility. It should be able to process the full range of data management, analytics and content management workloads. It should serve the full scope of users, devices and downstream applications.
Just as important, the agile big-data platform must be able to support these complex, shifting requirements while simplifying the ongoing chores of accessing, deploying, managing, and evolving it all as a unified business resource. Agility depends on providing a common set of tools, interfaces, security features, governance capabilities, metadata definitions, and other key infrastructure that act as a bridge among diverse fit-to-purpose component platforms (for example, a RDBMS-based data warehouse, an in-memory data mart, a Hadoop staging layer).
To achieve these ambitious objectives, enterprises must provision their agile big-data infrastructure with the following six core capabilities:
1. Leverage data discovery and exploration tools. These tools enable data scientists to create the right analytic model and computational strategy. Traditional approaches required data to be physically moved to a central location before it could be discovered. With big data, this can be expensive and impractical. Big data platforms are able to discover data “in place,” supporting the indexing, searching, and navigation of different sources of big data and facilitating discovery of a diverse set of data sources.
2. Run analytics closer to the data. Unlike traditional warehouses, new analytic architectures can run both data processing and complex analytics on the same platform, thereby boosting system performance.
3. Manage and analyze unstructured data. Typically, data is classified on the basis of its type—structured, semi-structured, or unstructured. Existing infrastructures encountered barriers that prevented the seamless correlation and holistic analysis of this data. Big data analytics platforms are able to manage, store, and retrieve both unstructured and structured data.
4. Analyze data in real-time. This will enable you to support streaming data analytics and scale effectively to manage increasing volumes of data.
5. Employ a rich library of analytical functions and tools. These reduce time-to-analysis with a robust set of accelerators, libraries of analytic functions, and a tool set that accelerates the development and visualization process.
6. Integrate and govern all data sources. A big data analytics platform embraces principles related to data quality, security, governance, master data management, data integration and information lifecycle management.
By following these six imperatives, you can continually optimize your agile big-data infrastructure to handle both analytic and transactional applications. These principles enable integrated systems to provide greater performance, reliability and simplicity. In turn, this will allow companies to reduce operational IT costs and achieve more powerful business results from their big-data investments.
James Kobielus is an IBM Big Data Evangelist. You can follow him on Twitter @jameskobielus.