For many organizations, data silos are the result of separating analytical and operational, or production, systems for performance reasons and to control costs. But advances in enterprise computing, including falling costs, are eliminating the need for these silos and changing how organizations architect and deploy applications.
Data Informed spoke with Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, about how converged data platforms enable organizations to leverage all of their data and better manage relevant event streams and data flows, and the real-time impact this can have on the bottom line.
Data Informed: What were the reasons that organizations traditionally separated analytical and operational systems?
Jack Norris: During the advent of relational databases, companies ran into problems when they tried to perform analytics on production systems. For example, customer purchase analysis needed to take a back seat to important financial systems such as accounts receivable. What eventually emerged were operational data stores and data warehouses that had data structures that were specifically geared to perform analytics. These specialized structures were driven by the underlying exponential cost curve of data analytics. The larger the data sizes, the increasingly expensive it was to process. Organizations were incented to sample data, reduce the data sizes, and perform a great deal of data processing and refining to reduce the size of the analyzed data set.
Is the arrangement of separate analytical and operational systems still in common use?
Jack Norris: Yes, it is extremely common today for organizations to run operational and analytic processes on different systems. In fact, specialized applications dictate the required data structure to support a specific analytical application or process. The result of this approach is the proliferation of data silos across organizations, where the typical company may have hundreds of data silos. This is a significant challenge for administrators who have to manage the populations of data structures, and the ETL processes across different silos. In fact, according to Gartner, the number one data-management issue is the proliferation of these data silos.
However, we are seeing the emergence of new approaches that combine operational data with analytics. Companies are now using high-frequency decisioning applications to increase revenues, reduce operational costs, and mitigate risks. These applications are not simply reporting on what happened; these are real-time applications that impact business as it happens.
What has changed to mitigate the concerns about converging these systems?
Jack Norris: The first thing that has changed is that we are actually seeing new architectures emerge that alleviate the cost concerns of analyzing a lot of data. In the past, we were rewarded for sampling data. Now, highly distributed scale out-models make the costs linear. Instead of sampling data, we can cost-efficiently look at the entire data population and identifying anomalies. That change in analytics can also be much more performant, so we can do these fast calculations on a real-time basis, which can augment production decisions and enable new applications that complement web interactions, ad auctions, or even transactional systems like credit cards.
It’s also important to note that we are not changing the existing transactional systems. We are seeing new applications leverage real-time data feeds from these transactional systems in a converged platform that performs analytics and drives high automation.
What are the business advantages of a converged data approach? How does this help organizations to realize value from their data?
Jack Norris: The converged platform collapses many disparate functions into one cluster. With a single converged platform, you can handle event streaming, perform database transactions including advanced NoSQL, conduct deep analytics using Hadoop or Spark, and store data reliably with long-term web-scale storage. Rather than coordinate across different systems for each of these activities, you can now do everything on one platform, which reduces cluster sprawl and administrative burdens. By eliminating processing delays and latency, you can perform end-to-end analytics on a real-time basis. The nature of these analytics isn’t about looking at the past or predicting the future. It’s about making decisions at that moment that can impact the business.
Examples of these types of applications range from advertising platforms that conduct billions of ad auction events per day, automated fraud-detection applications that protect online and point-of-sale transactions, manufacturing applications that continuously monitor operations for quality issues and actively prevent failures, and telecom networks that incorporate analytics to improve analytics and customize services.
What are the IT advantages of a converged data approach?
Jack Norris: A converged data layer dramatically simplifies the management, control, and security of data within an organization. It also simplifies the processing and the support for multiple applications that are operating on the data. Ultimately, what this translates into is increased data agility for a business. So it’s our contention that the source of competitive advantage in the future is not the company with the most data; competitive advantage will be driven by what company can understand the data flows in context and take the most appropriate business action, fastest. In other words, it’s all about data agility. A converged data platform reduces cluster sprawl and complex data movement processes, which dramatically lowers administration costs. A converged data approach also helps IT respond to needs of the business by eliminating separate silos, increasing productivity and, ultimately, improving profitability.
What tools/technologies can enable optimal performance of a converged data environment?
Jack Norris: A converged data platform can act as a bridge to integrate existing systems with the rich, vibrant open-source development that provides many new options to analyze and process data.
If a converged data platform can easily plug into existing systems – for example, NFS, POSIX, standard management interfaces like REST, standard database access methods like ODBC – it enables new applications to easily leverage existing environments. It also provides an extremely effective cost lever to offload data from more expensive legacy storage to a more cost effective and scalable platform.
The second area of integration is the rich big data ecosystem. There are rapid advancements within the open-source community, including technologies such as Hadoop and Spark, but also graph databases, machine learning tools, time-series databases, etc. A converged data platform that easily integrates and benefits from these rapid developments will enable organizations to quickly benefit and improve performance.
Changing a company’s entire technology architecture is a daunting proposition. What are some tips/best practices for organizations looking to make the move to a converged data environment?
Jack Norris: Gone are the days when organizations have the luxury of time to perform a complete re-architecture that may take many months or even years to complete. The approach that organizations must take is incremental. The incremental approach results in a series of ROI events driven by incremental projects whose end result is a new architecture. The first step for a converged data platform is typically an offload project – offloading data from a data warehouse, mainframe, or enterprise storage where the cost to store a TB of data is orders of magnitude more expensive than on a Hadoop platform. If the Hadoop platform includes converged enterprise storage features, then the data platform can serve as a long-term persistence store. Without having to store data separately, the returns are even higher. This first phase generates tremendous ROI without needing additional application development. The next step is to roll out new applications that take advantage of the available data sources. Organizations can then move through a series of applications that generate quick returns as they progress toward operational analytics. As was discussed earlier, these applications operate in real time and impact business as it happens, resulting in more significant ROI.
Are there cultural hurdles to making this change in addition to the technological challenges? How can organizations best address these?
Jack Norris: With a converged data platform, there is no need to “rip and replace” existing systems. It’s more about a platform that can control and handle the pace of data growth and data proliferation. A converged data platform can complement existing systems and processes, which, from a cultural/deployment standpoint, can be beneficial in that it’s not seen as combating existing approaches. The pursuit of new applications is another positive area. In general, organizations should focus on use cases where they can generate the quickest and highest returns. Starting with use cases that provide a quick payback and moving on to tackle applications that generate high ROI is a great way to direct the phasing of applications running on the converged data platform.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.