While big data analytics has become all the rage over the past few years, another technology also has entered the mainstream: data virtualization.
Data virtualization is the process of abstracting different data sources through a single data access layer which delivers integrated information as data services to users and applications in real-time or near real-time. Stated in terms that IT leaders and integration architects can use with their business colleagues, data virtualization ensures that data is well integrated with other systems so that enterprises can harness big data for analytics and operations.
Today, big data virtualization tools have become mature enough so that corporations are adopting them to lower the costs of traditional integration (through writing custom code, ETL, and data replication processes). The tools also allow for increased flexibility for data warehouse prototyping or extensions. Because data virtualization exposes complex big data results as easy-to-access REST (representational state transfer) data services, data virtualization tools make it possible to integrate data between enterprise and cloud applications.
The technology also simplifies data access in three steps: by connecting and abstracting sources, combining them into canonical business views, and lastly publishing them as data services. In this way it is similar to server, storage, and network virtualization in that it simplifies the appearance of what is being managed for users while under the covers it employs technologies for abstraction, decoupling, performance optimization, and the efficient use (or re-use) of scalable resources.
Unlike hardware virtualization, data virtualization deals with information and its semantics – any data, anywhere, any type – which can have a more direct impact on business value.
With enterprise analytics, you need both big data and access to that data to create real value. Big data involves distributed computing across standard hardware clusters or cloud resources, using open source technologies such as Hadoop, Amazon S3 and Google Big Query. Data virtualization can be part of this picture, too. In its report, “Data Virtualization Reaches Critical Mass,” Forrester Research says, “Integration of big data expands the potential for business insight” and cites this potential as a driver for data virtualization adoption.
Data virtualization can help organizations to extract value from large data volumes efficiently, and perform intelligent caching while minimizing needless replication. It has also enabled companies to access many data source types by integrating them with traditional relational databases, multi-dimensional data warehouses and flat files so that BI users can conduct queries against the combined data sets. For example, a leading crop insurer has used data virtualization to expose its big data sources and integrate them with its transactional, CRM and ERP systems to deliver an integrated view of sales, forecasts and agent data to its sales team. Using data virtualization, these complex reports could be developed much faster, using fewer staff resources than in the past.
Data virtualization represents a straightforward way to deal with the complexity, heterogeneity and volume of information coming at us, while meeting the needs of the business community for agility and near real-time information. IT will need to adapt to this reality or become less relevant as business owners increasingly drive technology decisions.
Suresh Chandrasekaran is senior vice president at Denodo Technologies, a provider of data virtualization tools. He has served in product management and marketing roles at Vitria, Alta Vista, Compaq and as a management consultant at Booz Allen. He has an MBA from the University of Michigan. Email him at firstname.lastname@example.org.