As makers of the best-known commercial software for the open-source Hadoop platform, Cloudera first branded itself as the corporate America’s most-accessible option for linking thousands of servers to analyze petabytes of data. Now with last week’s announcement of its fourth-generation release—which includes an enhanced open-source distribution of Hadoop, CDH4, and an improved integrated software package, Cloudera Enterprise 4.0—the Palo Alto, Calif.-based company hopes to spread the word beyond early adopters to the mainstream.
These days, Hadoop offers clear value to many online companies, including Facebook, Yahoo and Twitter, which use it not only to process data related to advertising, inventory, and customer profiles, but also to feed data into live, consumer-facing functions such as messaging services and news updates. But while the new Cloudera release makes great strides in integrating Hadoop into the software dashboards of its customers, industry observers say it remains to be seen whether it will make Hadoop the data-integration solution of choice for a new swath of enterprise customers in other industries.
To understand the hurdles facing Cloudera, it helps to know a bit about Hadoop’s history. Hadoop’s sudden arrival as a major player in the big data enterprise universe belies its grass-roots origins as a pet project of its founder, Doug Cutting, as noted in a profile of Hadoop published by Wired last fall. Cutting first developed Hadoop, named after his son’s stuffed elephant toy, in 2004 as an open-source alternative to search engines such as Google and Yahoo. Cutting was later hired by Yahoo, which moved its entire search infrastructure to Hadoop in early 2008. While Cutting envisioned Hadoop primarily as a search engine technology, others at Yahoo saw its potential as an all-purpose technology tool.
The obvious appeal of Hadoop was the speed with which it could create a “webmap,” a search engine’s index of web pages and their metadata. But another key feature was its ability to keep running even when an individual server failed—this seemingly minor aspect attracted companies with concerns about reliability and security.
By spring 2009, when Yahoo held its second annual Hadoop developer summit, Facebook and eBay already had adopted the technology for functions beyond search. So were IBM Research and Microsoft. In March 2009, Cloudera was launched by a team including former executives from Facebook, Yahoo and Oracle. The company hired Doug Cutting away from Yahoo later that year. Last year, Hortonworks, a Cloudera competitor, was co-founded by Cutting’s former boss at Yahoo.
Cloudera’s sales pitch for its latest enterprise solution is “end-to-end monitoring for the CDH stack,” according to Omer Trajman, Cloudera’s vice president for technology solutions. Among the specific improvements are the high availability feature, which includes two redundant NameNode machines, rather than the typical one, to allow a fast recovery if one machine fails. Trajman also cites increased security and scalability, as well as simplified configuration and deployment, and truly global customer support. “Now Hadoop can be a first-class citizen,” he says. “And the guy with the pager can sleep better at night.”
Cloudera’s hope is that its new release will convince some risk-sensitive industry categories that have been hesitant to adopt Hadoop, such as life sciences, manufacturing, oil and gas refineries, and power generation, to reconsider.
But Robin Bloor, chief analyst at the Bloor Group, fears that Cloudera is overhyping Hadoop’s potential. “It’s a good release,” he says of CDH4, “but Hadoop doesn’t have a lock on big data.”
Bloor argues that Hadoop is best suited for ETL (“extract, transform and load”) functions, to prepare data for use in reporting or analytics, “especially if you use the cloud.” But even today, many large corporations remain reluctant to transfer the bulk of their vital databases to remote servers. Bloor also believes that many of today’s Hadoop users rely on a much smaller number of servers than they used to, making the latest Cloudera release more reiterative than innovative.
Cloudera’s Trajman disagrees. “We’re setting the stage for the next era of data management,” he says. “Hadoop is a piece of that puzzle, and now a lot of its capabilities are built-in.”
Alec Foege is a writer and independent research professional based in Connecticut. He can be reached at firstname.lastname@example.org.