Apache Hadoop, the open-source software used by clusters of computers to crunch big data, is gathering momentum, and so is competition in the market.
“There’s a lot of competition across the stack, from the application vendors to the distribution vendors to the hardware vendors to the cloud service providers, because the market is growing at a cumulative annual rate of roughly 55 to 60 percent,” said Vin Sharma, director for Hadoop planning and marketing at Intel.
Intel recently shook up the Hadoop market when it announced a collaboration with and equity investment in Hadoop distributor Cloudera. Intel said the move made it the largest “strategic shareholder” in the company, as well as landing it a seat on Cloudera’s board of directors.
The move also ignited a storm of punditry about consolidation in the market, as the deal meant Intel would be cutting the cord to its own distribution of Hadoop. With Intel out of the picture, the number of distros would drop to those offered by five companies – Hortonworks, Cloudera, MapR, Pivotal, and IBM.
Intel’s investment is also seen as another sign that Hadoop has hit a key point in its development and is now ready for enterprise deployment.
“Between 2004 and 2008, the market was in its visionary stage,” Sharma said. “Between 2008 and 2012, there have been a number of people that have made big data operational, so we’re now poised on the cusp of it really taking off because enterprise adoption is starting to catch up.”
According to Mike Gualtieri, an analyst with Forrester Research, there’s a sweet spot for Hadoop in every large enterprise because it empowers them as never before.
“Now enterprises can do what Google and Yahoo can do when it comes to data management and analyzing large data sets,” he said. “And they can do it cheaply because it’s on commodity hardware.
“We think adoption will accelerate this year,” he added.
As adoption accelerates, competition – especially among big-three distros Cloudera, Hortonworks, and MapR – will heat up. “For these three companies, the competition is cutthroat,” Gualtieri said. “They’re all competing to be the most desirable distribution.”
While the three companies all offer open-source Hadoop, each has its own twist to differentiate itself from the others. “Some may suggest that open source is the key point of differentiation in this space,” said Clarke Patterson, senior director for product marketing at Cloudera. “But open source, by its very nature, is undifferentiated. So in many ways it’s just table stakes.”
“Interestingly,” he added, “when surveyed, customers say that open source is only one of many considerations and, in fact, the importance of open source wanes in comparison to other things such as performance, support, and stability.”
Of the three leaders in the market, Hortonworks is considered the closest to its open-source roots. The company’s business model emulates that of another open source powerhouse, Linux. “We sell support and services around Hadoop,” explained Jim Walker, director of product marketing at Hortonworks. “We’re going to provide a distribution completely for free, and we’re going to make sure that our flavor of Hadoop is always the closest to the Apache trunk.”
Forester’s Gualtieri added: “Hortonworks wants all innovation to be driven through the open-source community. If there’s something like a management console that the community doesn’t have, then Hortonworks will drive that through the community.”
While that kind of innovation adds to the purity of the Hortonworks distro, it can delay the process of providing Hadoop users with tools they are clamoring for. That’s where Cloudera distinguishes itself from Hortonworks. Although Cloudera’s Hadoop distro is open source and free and also makes money through maintenance and support, it will package additional software with its distro that hasn’t be vetted by the open-source community.
“Whatever gaps Cloudera sees in Apache open source, they won’t wait,” Forester’s Gualtieri said. “They’ll start innovating right away.”
“They did that with Impala, which is their SQL engine,” he added, “because the Apache project Hive wasn’t designed for interactive SQL queries. That woke up the open-source community, and they started working on a version of Hive that is 100 times faster than prior versions.”
Cloudera’s open-source-plus-extras approach has drawn fire from Hadoop purists, criticism that the company rejects.
“This is commonly misconstrued as creating a proprietary Hadoop distribution, but the fact remains Cloudera is as open when it comes to Hadoop as anyone else,” Cloudera’s Patterson said. “What Cloudera does add, however, is capabilities that are not available in the Hadoop community today.
“Things like fine-grained security and comprehensive audit tracking simply do not exist,” he added, “so we build it in so organizations can quickly adopt this technology with confidence.”
Patterson also noted that Cloudera is making significant investments in enterprise support. “No other distribution does this, instead they’d rather put the burden on their engineering teams or partners to deliver this service,” he said.
MapR, which departs farther from the tenants of pure open source than its competitors, also has an eye on the enterprise.
“MapR is working on making the internals of Hadoop enterprise fresh in terms of high availability, performance, scalability, and efficiency,” Forester’s Gualtieri explained. “MapR is doing things to make Hadoop stronger in production environments, whereas Cloudera is innovating to allow Hadoop to connect and play well with other data management platforms within an enterprise.”
Not only are the distro makers battling each other for business, but they’re also battling each other for developers to write apps for their Hadoop flavors. “Application developers can be hugely important for a distro,” Intel’s Sharma said.
“The entire ecosystem will be reaching out to developers to help it build applications on top of Hadoop so Hadoop becomes much like an operating system,” he added. “We see that emerging very rapidly now.”
One tactic to attract developers is to make a distro friendly to the skills an app writer may already have. For example, Cloudera’s Impala is designed so developers with SQL skills can build BI apps on it. By the same token, Hortonworks’ partnership with Red Hat makes it easier for JBoss developers to build on that distro. And MapR’s recently announced App Gallery includes documentation developers need to certify their apps against MapR.
“Developers are an important audience for us,” said Jack Norris, chief marketing officer for MapR. “One of the things we focus on is making it easier to experiment with our product.” One way MapR does that is by offering developers a “sandbox” running in a virtual environment with integrated tutorials.
“You can also use standard tools – anything that can interact with enterprise storage through a standard interface can be used with a MapR cluster,” he added.
He also noted that MapR’s enterprise features have attracted developers who initially worked in other Hadoop distros. “We’ve seen developers who have started with Apache and moved to MapR when it became time for serious production,” Norris said.
However, Jeff F. Kelly, principal research contributor for The Wikibon Project, an open source research and advisory firm, noted, “Application development on Hadoop is generally still a work in progress.
“But that’s where the real value is,” he continued. “We hope to see more start-ups emerging that use all the capabilities Hadoop offers, but [that also] focus on building applications that tackle business problems rather than on Hadoop distribution and other platform-level offerings.”
While the future seems bright for Hadoop, Forester’s Gualtieri cautions that as the distro makers battle for customers, they need to pay attention to their flanks. “These companies could be threatened by Hadoop becoming a service in the cloud,” he noted.
“The other threat is that Hadoop will start to be bundled with operating systems,” he added. “That would really be disruptive.”
He recalled Netscape’s web server business in the 1990s. “They were selling web servers for $50,000, $60,000 a pop,” he said. “How do you get a web server now? It’s built into the operating system. That’s where Hadoop could go.”
John P. Mello Jr., is a freelance writer specializing in business and technology subjects, including consumer electronics, business computing and cyber security. Follow him on Twitter: @jpmello.