Data analytics is getting a lot of attention these days. It’s hard to avoid the term “big data”, and tales of untold riches buried in the unstructured “dark” data that surrounds us are common fare in the blogosphere.
Less heralded but no less important is the problem of efficiently accessing all that data, especially when it’s stored in multiple places.
The ultimate vision is unified information access: use one tool to access all enterprise data, wherever it’s stored. This “single pane of glass” idea is seductive. What company wouldn’t want to break down silos, pool data and correlate its way to valuable new information? Several hard realities keep this vision floating somewhere beyond the realm of purchasable technology, however.
First, it’s just plain difficult. There’s been much progress in recent years around indexing multiple sources of data. And several vendors offer semantic analytics for pulling information from multiple sources without indexing. But extending that capability across an entire enterprise takes a lot of effort and infrastructure—the kind of investment that’s usually hard to justify.
“When people talk about enterprise search, they’re talking about a certain scale,” said Katey Wood, an analyst at market research and consulting firm Enterprise Strategy Group (ESG). In addition, a large-scale solution “doesn’t mean that we’re going to be able to search everything through a single pane of glass, which is sometimes what it sounds like from what the vendors say.”
Second, there just isn’t much demand for general, enterprise-wide unified information access. “People talk about enterprise search as this holy grail of being able to access anything in your enterprise instantaneously,” said Wood. “Ultimately you’re using search as a tool in order to do something—the end goal is not just to search things, at least not if you’re being realistic.”
It’s clear that there’s no one-size-fits-all enterprise search technology. A given enterprise search product is usually better suited for some applications than others, said Seth Grimes, an industry analyst, consultant and organizer of the Sentiment Analysis Symposium. “Focused solutions always outperform general solutions,” he said. “The market for search solutions that are adapted to particular verticals or business problems or types of data or styles of results delivery is going to remain much larger than the market for generalized unified access.”
Unified information access makes it possible to draw a single set of results from data originating in disparate forms and systems via a single query interface, said Grimes. Such access “does not presume what the tool does with the results,” he said. Many of the tools are focused on business intelligence, he said. Other common applications include electronic discovery, pharmaceutical research and intellectual property research.
“As a lot of these companies have matured they go from being pure enterprise search into having search-based applications,” said Wood.
Beyond choosing a technology based on how it matches up with particular applications, organizations looking at enterprise search need to consider the scale of investment they’re willing to make. “How much infrastructure do you want to put under it driving the thing, which is what’ll make it faster, and then how much tuning do you want to do to it?” said ESG’s Wood.
Forrester Research divides the enterprise search market into three categories: specialized search vendors, integrated search vendors and detached search vendors.
Specialized Search Vendors
Specialized search vendors target specific use cases, subsections of the enterprise, or vertical markets. They include: Attivio, Cambridge Semantics, Coveo, Oracle Endeca, Exalead, Recommind, Sinequa, and IBM Vivisimo.
All the specialized search vendors provide access to data across different data sources, but differ in how they do it. Some companies provide an index that spans data sources. Others use analytics to find related data from multiple sources without having to build an index.
Attivio, Coveo, IBM Vivisimo, Oracle Endeca and Recommind crawl various structured and unstructured data sources and build an index much like Google and other Web search engines. Attivio focuses on custom business intelligence applications. Coveo focuses on content management and customer intelligence applications. Recommind focuses on search and information management applications, principally in e-discovery and e-governance.
Oracle Endeca’s faceted index—an index that classifies data in multiple ways — provides highly-navigable data views for business intelligence, particularly e-commerce. IBM Vivisimo provides document-oriented results clustering, principally for supply chain, R&D and big data applications.
Cambridge Semantics and Exalead both use semantic analysis—a basic form of artificial intelligence that infers some degree of meaning from data. Cambridge Semantics uses Semantic Web technology to integrate structured and unstructured data for custom Business Intelligence applications. Exalead uses semantic analytics to unify data for custom search applications, particularly in engineering.
Sinequa combines semantics and indexing. It’s Unified Information Access platform uses natural language processing and semantic analytics to sort and combine data from multiple sources and then builds an index for search-based applications, principally for Business Intelligence.
Of these companies, Attivio, Coveo, Exalead, Oracle Endeca and Sinequa have the most advanced Unified Information Access capabilities, said Grimes.
Integrated Search Vendors
Integrated search vendors provide search tools as part of broader information management offerings. They include: HP Autonomy, IBM and Microsoft.
Autonomy integrates and analyzes structured and unstructured data for a wide range of search, analytics and information management applications. IBM also combines search and analytics in its Content Analytics product. Microsoft’s enterprise search technology is tied to the company’s SharePoint content management platform.
Autonomy’s IDOL search engine is the undergirding technology for a number of applications, and when they acquire a company they put its technology on the IDOL platform, said Wood. Most of Autonomy’s business is in either particular applications such as contact centers and e-discovery or accessing particular forms of information: just text, just audio, just video, said Grimes. “They’re beyond Microsoft in providing UIA, but UIA is not a key focus,” he said.
FAST, prior to its acquisition by Microsoft in 2008, was one of the earliest search vendors working on Unified Information Access, said Grimes. “The unified access capability is one of several left by the wayside in Microsoft’s SharePoint-ization of FAST,” he said. Several of the FAST’s core employees went on to found Attivio.
Detached Search Vendors
Detached search vendors focus on ease-of-use and ease-of-installation. They include: Fabasoft, Google and ISYS.
Fabasoft offers a search appliance that’s focused on document and content management. Google offers a drop-in search appliance. ISYS offers document-centric enterprise search as an appliance or add-on to a content management system like Microsoft’s SharePoint.
The Google appliance isn’t the most cost-effective means of enterprise search because it’s only so scalable, said Wood. When you reach your installation’s limit you have to buy another box.
In addition to the packages sold by the vendors above, enterprises have access to a robust open source enterprise search platform in the Apache Foundation’s Lucene/Solr project. Several of the enterprise search vendors use Lucene/Solr in their technology stacks. There are also consulting firms that specialize in Lucene/Solr installations. Among the best known of these is LucidWorks.
In the end, an organization should think less about building a grand unified information access infrastructure and more about finding the best search technology for its needs, said Wood. “What is the use case that you want to support? What is the business process you want to support? What are your goals and requirements?” she said. “Choose your goals, figure out what’s going to suit you best and have that in line with your IT requirements and how much you’re willing to spend on it.”
Eric Smalley is a freelance writer in Boston. He is a regular contributor to Wired.com. Follow him on Twitter at @ericsmalley.