Hadoop may be built for big data but it’s not known for supporting the slicing and dicing of real-time information. Now a number of companies are working to change that by connecting in-memory technologies to the popular open source distributed file system so that enterprises can deploy real-time analytics.
Vendors such as ScaleOut Software and SAP are offering in-memory computing tools that enable data to be accessed and analyzed at lightning-fast speeds. Typically, static data is stored on a disk-based storage system and then copied and transferred to a Hadoop Distributed File System (HDFS) for analysis. In-memory, on the other hand, eliminates this time-consuming step by storing data in memory so that information can be retrieved automatically from the HDFS for real-time analysis.
Research firm Gartner sees growing adoption of the technology. It reports that in 2012, 10 percent of large and medium-sized organizations adopted in-memory computing in some capacity. By 2015, Gartner projects that figure will rise to 35 percent.
SAP has been working to gain mainstream acceptance for in-memory systems with SAP HANA, a software platform that processes transactional and analytical data fully in-memory. On May 7, the company unveiled a cloud-based service for its SAP Business Suite running on HANA. Rather than spend “a significant amount of time shuffling data between disk and memory,” Yuvaraj Raghuvir, SAP Labs’ senior director says, SAP HANA makes sure “all the data is loaded in memory so that all the processing works with the data directly; there’s no go-and-fetch data from the disk. The data is fully expected to be available in memory all the time so that analytical processing can work instantaneously.”
ScaleOut Software is another vendor that’s banking on in-memory computing to boost the performance of business applications and accelerate data analysis. ScaleOut’s in-memory processing middleware, hServer, stores and analyzes live, “fast-changing online data directly in Hadoop so that such data doesn’t have to be copied into HDFS and then moved to memory,” says William Bain, ScaleOut’s CEO.
Other vendors and researchers are pursuing other courses to speed up queries to Hadoop. For example, Cloudera’s Impala tool lets users query data in real time by bypassing MapReduce to access data directly through an open-source, real-time query engine for HDFS. Others like Drawn-to-Scale, NuoDB and TransLattice, are working to update relational database technologies to handle big data workloads that are a core Hadoop strength.
Taken together, these moves are a push on the acceleration pedal that’s likely to drive greater adoption of Hadoop. For example, ScaleOut’s Bain points to Facebook as an example of one company that transfers static data from its online database into HDFS so that data analysis can be conducted offline.
That’s in sharp contrast, however, to the “stock trading firms, hedge funds, investment banks and brokerage firms that need to be able to analyze what’s in their portfolio during the trading day while things are changing very rapidly,” says Dave Brinker, ScaleOut’s COO.
But financial outfits aren’t the only ones that stand to benefit from in-memory computing in a Hadoop environment. Consider, for example, the airline that has just learned of a cancellation and wants to be able to inform its passengers immediately in order to begin rerouting them to their destination. Online retailers can also take advantage of in-memory processing when examining live clickstream data to offer customers buying incentives while they shop.
Nevertheless, Brinker warns, there are many who are waiting to see demonstrations of such use cases in business settings. “A lot of people want to use Hadoop in real-time scenarios but they don’t believe it’s particularly good at that right now,” he says.
Cindy Waxer, a contributing editor who covers workforce analytics and other topics for Data Informed, is a Toronto-based freelance journalist and a contributor to publications including The Economist and MIT Technology Review. She can be reached at firstname.lastname@example.org or via Twitter @Cwaxer.
Editor’s note, May 10, 2013: The original version of this story was updated to include a reference to SAP’s unveiling of its cloud-based version of SAP Business Suite for HANA.