GridGain, ScaleOut Develop In-Memory Accelerators for Hadoop

by   |   November 7, 2013 5:00 am   |   0 Comments

SAS and SAP are teaming up to create in-memory computing tools that help companies process data more quickly, and IBM is promoting its own in-memory accelerator for the DB2 database. As tech giants continue to develop their in-memory offerings, smaller players like ScaleOut and GridGain are also helping to bring greater attention to these innovative alternatives to traditional data processing technologies.  All of which could bolster Hadoop as a mainstream option for the enterprise.

“I see GridGain and ScaleOut’s technology being used as accelerators for Hadoop,” says Madan Sheina, a lead analyst with research firm Ovum. “Both allow MapReduce code to execute over in-memory data rather than data stored in the Hadoop Distributed File Server but with the added benefit of fast update capabilities. That potentially opens up Hadoop as a platform for a new set of more operationally focused application workload processing.”

Related Stories

Podcast: What Hadoop Version 2 means for business.
Read the story »

SAP, ScaleOut Software bring in-memory connectors to Hadoop.
Read the story »

Alternative to in-memory analytics relies on column-based data stores.
Read the story »

Understanding the big data stack: Hadoop’s distributed file system.
Read the story »

For instance, in early October, ScaleOut Software released a new and improved version of its in-memory data grid. Dubbed hServer V2, the platform lets organizations run Hadoop MapReduce on live data without having to install or manage layers of Hadoop. Because hServer V2 acts as a self-contained Hadoop MapReduce engine, the standard Hadoop open source distribution doesn’t have to be installed, resulting in accelerated data processing times.

“Instead of using Hadoop’s stock batch scheduler, we’re able to speed up the acceleration of Hadoop MapReduce by using our internal data parallel computing platform,” says ScaleOut CEO Bill Bain. “That allows us to schedule MapReduce jobs in a second rather than taking 30 seconds.”

Eliminating this type of “overhead” promises to be a boon for industries that rely on real-time data processing for decision making, according to Bain. As proof, ScaleOut recently ran a side-by-side comparison of Hadoop and hServer V2 as part of a real-world use case – using real-time analytics to compute alerts for a hedge fund’s portfolio based on a simulated market feed. By allowing continuous updates to market data flow through the engine in real time, hServerV2 helped the hedge fund perform repeated analysis and alert traders to price changes “on a second by second basis,” says Bain. He cites online recommendation engines and fraud detection as other use cases.

GridGain is another in-memory computing solution provider that’s appealing to enterprises by crunching and retrieving massive amounts of data stored in a computer’s main memory rather on a disk-based storage system. Released in early October, GridGain 5.2 circumvents the time-consuming and complicated process of copying data and transferring it to a Hadoop Distributed File System (HDFS) for analysis.

Similar to ScaleOut, GridGain also pulls data directly from servers’ hard drives for analysis. However, unlike ScaleOut, GridGain offers “different in-memory products for different use cases,” says GridGain CEO and founder Nikita Ivanov. “Instead of having one analytic block that does everything, we have individual products for different pay loads.”

In-Memory HPC 5.2, for example, is designed for high-performance computing systems common to financial institutions that typically don’t deal with large data sets but must perform complex calculations across thousands of databases. GridGain’s In-Memory Database 5.2, on other hand, is better suited to typical transactional processing of static information like consumer credit scores and purchase history.

Regardless of approach, though, the drivers behind greater in-memory computing’s mainstream acceptance are the same. “The economic affordability of large sets of memory is what’s driving today’s renaissance of in-memory computing,” says Ivanov. “All of a sudden, we can finally afford what we always knew would be the ultimate frontier.” For example, Ivanov says a terabyte of dynamic random-access memory in a cluster cost around $25,000 three years ago. Fast forward to today and “for the price of a fully loaded Tesla car, you can buy enough RAM to hold all the working datasets of Twitter,” he says.

Another factor driving adoption of in-memory computing tools: the desire to run Hadoop on both the front and back ends of a business so that the software being used to, for example, process a website’s transactional data is the same technology being used to run a data warehouse in the back end. “I can’t tell you how many businesses we talk to who have one system on the front end and Hadoop on the back end,” says Ivanov. “They’re a complete zoo of systems.”

Cindy Waxer, a contributing editor who covers workforce analytics and other topics for Data Informed, is a Toronto-based freelance journalist and a contributor to publications including The Economist and MIT Technology Review. She can be reached at or via Twitter: @Cwaxer.

Home page image of race car by Marion Doss via Flickr. Used under Creative Commons license.

Tags: ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>