The current hype around big data can be traced back to 2005, when the tools for big data became available and the economics of data storage using these tools turned favorable. One of the big reasons storing big data in Hadoop became popular was because Hadoop turned commodity servers into highly available shared storage that was an order of magnitude cheaper than even the cheapest network-attached storage on the market, for that volume of data.
The hype around the potential of big data led to the realization by even non-technology companies that harnessing big data for customer insights, business insights, or new trends would lead to growth opportunities. And that led us to where we are today: batch analytics with big data becoming an acceptable mainstream technology.
There are two major shifts underway that will change the nature of big data analytics:
- Companies are realizing that data analysis needs to produce results that are real time and drive automated, split-second business decision making for the best possible customer experience and customer retention.
- The changing nature of the underlying economics of memory technologies will make this analysis increasingly cost-effective.
Big Data Insights Need to Be Real Time
As businesses worldwide compete to acquire more customers, sell more products, and drive more growth, new big data analytics tools can help them predict customer behavior, deliver more interactive experiences, and delight the customer better than their competitors. Winners of this race are likely to be companies that not only can master their data but do so at lightning speeds. Growth drivers are going to be the reward for companies that are able to mine insights and catch the newest trends fastest.
The increasing popularity of in-memory technologies are testimony to this trend. Nothing can beat the speed of RAM when it comes to data processing, and traditional disk-based analytics are likely to fall out of favor or be limited to slower, longer-term batch analyses. The one thing that usually holds businesses back when it comes to using memory to speed up analytics is the cost of memory (i.e., DRAM), and that is about to change.
Economics of Memory Are Being Disrupted
Innovations in the field of Storage Class Memory (SCM) and with 3D Xpoint (pronounced “three-D cross point”) technology are fundamentally changing the economics of in-memory computing. Compared with the more widely available flash memory, which is 10 times cheaper but three orders of magnitude slower, SCM and 3D Xpoint will be three times cheaper but only a single order of magnitude slower than DRAM. See the chart below.
Another advantage of the SCM/3D Xpoint technologies is the elimination of storage engines and filesystems if being used as memory extenders. These maintain the byte access semantics of DRAM and thus dramatically reduce the storage access overhead; but 10 times slower might not acceptable by most in-memory applications. The new SCM/3D Xpoint chipsets will allow applications to decide which part of the data will be stored on the DRAM part of the chipset and which part on flash.
Analytics jobs that previously were too expensive to run in memory (i.e., in DRAM) can be easily justified given the cost of new technologies that can optimize dataset distributions between fast (DRAM) and slow (SCM/3D XPoint) memory. Judicious use of these emerging memory technologies can bring performance very close to pure RAM. The first chipsets with these technologies will become available for production in mid-2016. However, to fully utilize their benefits, in-memory databases and platforms will be redesigned to store the data in two forms of memory – fast and slow, and then developers and architects can start analyzing petabytes of data in-memory and cost-effectively.
With the economics of memory becoming more favorable toward processing more and more workloads in-memory, it is but a matter of time before all time-bound analysis moves to memory.
So What Should You Do Today?
As organizations start expecting analytics to move at the same speed as business, they should start to organize their applications around data and decisions driven by data. Decisions that need to be made quickly need data analysis served at the appropriate speeds. For example: When a user is interacting with your application, and you want to customize her experience (maybe you want to position an offer, an ad, or you want to display her recent activity), timeliness is essential. Analytics that enable these types of experiences have to be at Internet speeds – given user expectations of getting responses in less than 100 ms, and Internet round-trip times of 50 ms, your DB accesses have to be completed under 1 ms.
Once you start thinking about data in this way, it will become clear which data needs to be available at what speed and what analysis needs to be completed in real time. This will help you segment your data strategy and craft an approach in which you can make the performance-cost tradeoffs and leverage the optimizations enabled by the upcoming new memory technologies. Also key in your architecture decision should be technologies that have been designed to take advantage of the new memory technologies and their fit in your decision-making paradigm.
Yiftach Shoolman is an experienced technologist, having held leadership engineering and product roles in diverse fields, from application acceleration, cloud computing, and software-as-a-service (SaaS) to broadband networks and metro networks. He was the founder, president, and CTO of Crescendo Networks (acquired by F5, NASDAQ:FFIV), the vice president of software development at Native Networks (acquired by Alcatel, NASDAQ: ALU) and part of the founding team at ECI Telecom broadband division, where he served as vice president of software engineering.
Yiftach holds a Bachelor of Science in Mathematics and Computer Science and has completed studies for a Master of Science in Computer Science at Tel-Aviv University. Contact Yiftach on Twitter and LinkedIn.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.