Gartner estimates that the data integration tool market was worth approximately $2.4 billion in constant currency at the end of 2014, an increase of 6.9 percent from 2013. The growth rate is above the average for the enterprise software market as a whole, as data integration continues to be of critical importance for addressing the diversity of problems and emerging requirements.
Emerging requirements in social media and cloud computing have accelerated the proliferation of data beyond the firewalls and, thus, the diversity of problems to integrate the disparate data into a single virtual or physical repository such as a data services layer (virtual) or data lakes (physical). However, the growth of data volumes has outpaced the speed of data integration, driving the need for more agile, real-time data integration methods such as data virtualization.
Why Use Data Integration?
In the above-referenced report, Gartner said, “Enterprises’ need to improve the flexibility of their information infrastructure is intensifying their focus on data integration activities.” This should come as no surprise, as typical information infrastructures within enterprises are complex due to a number of factors. Companies have deployed off-the-shelf software for CRM and ERP; they have built their own custom applications to meet their unique requirements, for example, for compliance; and they procure data from third-party data providers such as Dunn and Bradstreet. As a result, the data have become increasingly distributed, siloed within the systems, and possibly duplicated. But sales, marketing, finance, human resources, and other functions need information across these different systems in order to conduct their daily functions. Rather than forcing the business users to access the data from the disparate systems, IT teams have used data integration technology to move data from one system into another, or into a central repository such as a data warehouse.
Enterprises continually add new systems or replace existing systems to modernize their infrastructure in order to automate their operations or provide a better customer experience. Hence, data integration will never go out of vogue. The growth in the data integration market, according to Gartner, is driven by a number of recent trends such as, among others, the evolution of cloud computing and the need for a hybrid approach to integrating cloud and on-premise applications.
It’s not news that data growth has exploded in the recent years. Several years before, it was only companies like Google and Yahoo that needed big data systems to store the pages they indexed. To store the information, these organizations procured large number of servers, an approach that was unaffordable to many common enterprises. Thanks to Yahoo’s efforts in finding a low-cost alternative, Hadoop has driven down the cost of storing and process very large volumes of data for all enterprises. Today, Hadoop has expanded to big data and to concepts such as data lakes to store all enterprise data in a central place.
One of the core promises of big data is to help organizations find meaningful patterns among a very large volume of inter-connected information. To quote a customer in a large pharmaceutical company, “We have the cure for cancer, but it’s lost somewhere in the data.” I have also met physicians at various technology conferences who have transitioned into technology roles for the sole purpose of finding disease patterns among clinically correlated data. They are tasked with ensuring that scientists and researchers are able to analyze data from multiple sources – on-premise, cloud, partner organizations, third-party data providers, social media, and so on.
All these new technology paradigms are driving the need for agile, real-time data integration among these ever-growing number of data sources. The traditional data integration methods of physically moving the data from one system to another using batch processing such as extract, transform, and load (ETL) no longer meet the requirements of business users that need to be able to rapidly integrate data to find meaningful results.
Data Virtualization as a Solution for Rapid, Agile Data Integration
One method of real-time data integration that is gaining rapid adoption is data virtualization. Data virtualization presents relevant, interrelated data in real-time and consistent formats, irrespective of underlying database systems, structures, and storage. Because data virtualization doesn’t replicate or store data, it delivers the complete, business-critical information at a fraction of the cost and time. Business users benefit from faster access to data, and IT teams continue to deliver data undisrupted – without worrying about changes to underlying systems.
Data virtualization accomplishes this by connecting to different data sources, whether structured or unstructured, on-premise or on the cloud, combining the related data, and then making it available to the consuming systems in any format they would like to consume. It does all of this work without physically storing or replicating the data, hence “virtual.” More importantly, the entire process happens in real time.
Because data is integrated on-the-fly, there is no need to develop extensive mappings and transformations as in the case of ETL. Hence, it takes less than one-sixth of the time to deliver the same data through data virtualization than through ETL. For the same reason, it is easy to add a data source, which takes only a few days with data virtualization as opposed to months with ETL. Due to the lack of heavy programming needed to virtualize data, there also is no need for an army of developers, which results in significant cost savings. In addition, there is no storage cost involved, as the data is not physically stored.
Because of these reasons, it is no surprise that data virtualization is increasing in both traction and importance. IT teams favor data virtualization as it is cost-effective, offers incremental functionality and time-to-value, and supports a growing interest in self-service.
Many organizations have adopted data virtualization to provide business users with the agility and responsiveness they demand. For example, one the world’s largest heavy equipment manufacturers is using data virtualization to combine sensor data from their machines with owner information in their back-office systems to provide proactive predictive maintenance. This enables the company to provide superior customer service and to differentiate itself from other low-cost equipment manufacturers. In another case, a non-profit organization is using data virtualization to abstract the technical vagaries of big data and provide a familiar SQL-access to their business users, thus driving the rapid adoption of big data technologies without any training.
Data virtualization is being adopted worldwide by organizations looking to optimize their data integration capabilities because it delivers business-critical information at a fraction of the cost and time of physical data-integration approaches.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.