Apps are a good fit for the cloud, if only evidenced by the thriving ecosystem on Salesforce.com alone. After all, cloud apps are easy for business users to adopt as they are immediately available and ready to access, without having to wait for IT to install them. They can also be easily tried and purchased via flexible subscription and pricing options, which makes them easy on the departmental budget. Cloud apps enable IT teams to focus on adding value to the business by extending the software and customizing it, rather than maintaining the software and fixing it if it breaks.
Is There an App for That?
Despite the massive acceptance of cloud apps, there has been much debate on whether data-integration-in-the-cloud would ever be as successful as apps-in-the-cloud. The challenge, of course, lies in the physical and logical distances that need to be traversed to integrate the data necessary to run today’s business. Data sources are getting exponentially larger, as are the associated costs and complexities related to relocating the data. Also, as new technologies get added to the mix, it becomes increasingly challenging to bridge the gap with older legacy systems. Many of these legacy systems were designed for highly specific purposes and sometimes employ proprietary access methods using individual applications that are often out-of-date. This challenge is compounded during mergers and acquisitions when organizations need to bridge a greater number of heterogeneous systems in a relatively short time.
In addition to the potential language barriers, older, on-premise sources can present other access limitations. For instance, the data often has to be moved to a data warehouse (via batch-oriented extract, transform, load (ETL) processes) before it can be accessed, which makes on-premise sources particularly challenging to integrate with cloud-based sources.
Security is another issue, as moving data across wires makes it susceptible to being lost or compromised through hacking. However, in many cases, this issue can be surmounted since the entire web — including e-commerce, financial, and government applications — often runs on a foundation of security. Unfortunately, and in order to bring data integration to the cloud, security is a likely burden that will fall to the app developers.
Data-in-transit is also susceptible to a sudden loss of network bandwidth, which can cause unexpected delays. If data integration were in the cloud, users would be highly intolerable of such delays. Business users are fairly comfortable querying a single unified system, such as a data warehouse, but imagine if a user received a message such as “The data you requested is in transit,” followed by a progress bar.
All of the Data at Your Fingertips
The only way to bring data integration to the cloud in such a way that the technology surmounts all of these obstacles is for the data to stay exactly where it is, yet still provide users with a way to run queries across the entire heterogeneous data set. Latest advances in data virtualization have made this possible.
Data virtualization establishes a layer of intelligence between consumers of the data and all the myriad systems in which the data is stored. These data “consumers,” which consist of both people and applications, can then send automated requests for data. The virtualization layer contains no data; instead, it contains the metadata for how to “speak” to all the different source systems. It also knows the format and schema that the data is in, as well as each system’s security requirements and any other details necessary for accessing each individual source system. A user needs only to query the data virtualization layer, and the layer will do the work of querying the individual sources before passing the results back to the user. This happens in real time, since no data needs to be transferred by a batch or any other type of replication process, and all the complexities related to accessing the sources are hidden from the user.
Modern data virtualization platforms feature sophisticated query-optimization algorithms, and these platforms are responsible for managing hundreds of simultaneous queries and sub-queries across multiple sources. Query-optimization techniques are geared toward enabling the maximum efficiency for all queries. This is accomplished by taking full advantage of the processing capabilities of each source, moving only on what is absolutely necessary.
As a result, security becomes very easy to manage since the data virtualization layer is a single point for accessing all of the sources. In addition to knowing the language of each source and the format in which the information is stored in each source, the data virtualization layer also has the “keys” to each. These keys are the necessary credentials for accessing the source, including a full knowledge of the source’s security policies. Modern data virtualization platforms implement the security policy on behalf of each system, preserving role-based permissions across schemas, rows, and potentially even individual cells.
Integration in the Cloud
Data virtualization is a natural technology for bringing data integration to the cloud and it is poised to make data-integration-in-the-cloud just as successful as apps-in-the-cloud. By enabling seamless, real-time access to data without the need for replication, data virtualization may be the missing link to helping organizations easily achieve the benefits of managing data integration in the cloud.
Ravi Shankar is the Chief Marketing Officer at Denodo. He is a recognized luminary in the data management field and has more than 25 years of experience in the enterprise software industry driving product marketing, demand generation, field marketing, communications, social marketing, customer advocacy, and partner and solutions marketing. To learn more visit www.denodo.com.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.