With Gartner reporting that 65 percent of packaged analytic applications will be embedded with Hadoop by 2015, developers can no longer afford to ignore this increasingly prevalent open-source programming framework.
At the same time, working with Hadoop requires specialized skills, sophisticated infrastructure for large-scale data analysis and a spare cluster or two for experimenting – not exactly realistic expectations for Hadoop newbies with limited know-how.
To address this knowledge deficit, a growing number of vendors are offering Hadoop sandboxes – testing environments that use a single-node implementation of Hadoop, typically packaged in a virtual machine, that allow developers to test out Hadoop’s features and functionality without impacting live servers, mission-critical systems or existing data.
Continuuity is the most recent newcomer to attempt to democratize big data with its sandbox called AppFabric, an application-serving platform that promises fast and easy development and deployment of big data apps.
The Palo Alto-based company in February unveiled the public beta of its Continuuity Developer Suite and Continuuity Developer Sandbox. Users can sign up for 90-day free access to the self-service sandbox, a dedicated 8-core virtual machine in the cloud that runs a scaled-down version of Hadoop and HBase, as well as the entire Continuuity AppFabric developer platform.
By “abstracting the unnecessary complexity of big data infrastructure,” Todd Papaioannou, Continuuity’s co-founder and CEO says the company’s Developer Sandbox lets “developers immediately begin exploration and development. Sandboxes remove the need for new or dedicated hardware, specialized environments and cumbersome configuration, which are unnecessary during the early stages of development.”
Hortonworks is also banking on developers’ desire for consequences-free Hadoop experimentation with the January 2013 release of its Hortonworks Sandbox. Built on the Hortonworks Data Platform, the free, self-contained virtual machine is pre-configured with Hadoop and also includes access to instructional demos, videos and hands-on tutorials.
While sandboxes enable vendors like Continuuity and Hortonworks to get their products into the hands of developers, many big data players are emphasizing the educational upside of Hadoop free play. EMC-offshoot Greenplum, in San Mateo, Calif., for example, offers Greenplum Analytics Workbench. The 1,000-node cluster with 24 petabytes of physical storage and 48 terabytes of memory comes loaded with Hadoop and its most basic components, including the Hadoop Distributed File System, MapReduce, Pig, Hive, HBase and Mahout, as well as Greenplum Database. Not only can developers experiment with their code in the cloud, Greenplum is working with Apache Software Foundation to ensure that developers’ discoveries, from favorable hardware optimizations to technical bugs, are available to the open source community.
Similarly, BigDataUniversity.com is an online educational website run by IBMers on a volunteer basis. The site provides more than 68,000 users with big data-related courses and databases. Courses primarily consist of hands-on labs, many of which can be accessed in the cloud thanks to sponsorships from Amazon Web Services.
“In the world of Hadoop, it’s not the products that are missing,” says Leon Katsnelson, a program director for big data and cloud computing at IBM in Toronto. “It’s the experience and knowledge that’s missing. So we give you the lessons, we give you the environment, we give you the products to learn on and we give you the data so that you’re able to do hands-on labs without struggling with downloading, compiling, running, debugging, or fixing.”
All of which is not only helping to drive greater Hadoop adoption, but it’s also changing developers’ approach to learning. “IT education is undergoing a tremendous amount of change,” says Katsnelson. “More and more people are providing materials and learning aids online rather than driving people through structured courses.”
Cindy Waxer, a contributing editor who covers workforce analytics and other topics for Data Informed, is a Toronto-based freelance journalist and a contributor to publications including The Economist and MIT Technology Review. She can be reached at firstname.lastname@example.org or via Twitter @Cwaxer.