As we shift to a digital economy, companies need to innovate faster than ever before. To maintain and advance this pace of innovation, adoption of in-memory technology is soaring. In-memory delivers the fastest, most responsive results in computing today.
A modern, in-memory solution, such as an in-memory database, is scalable across servers, supports memory as well as flash and disk for persistence, and is easily deployed on site or across clouds.
Yet certain misconceptions remain about cost, ease-of-use, and scalability of in-memory solutions. Many of these myths are beholden to architectures long surpassed by modern approaches. In dispelling five of the most common myths surrounding in-memory solutions, we will see how companies are getting ahead by using real-time applications and in-memory databases.
Myth 1: In-memory Solutions Use Memory only and Do Not Use Disk
In-memory databases use a combination of memory as well as disks, or in many cases, flash or SSDs. The disks or SSDs are used in multiple ways. First, they persist every transaction via logging. Next, they may store snapshots of the dataset at a moment in time, and for backup operations.
The power of in-memory solutions surfaces when the solutions are architected from the ground up with a focus on memory. This enables ingest speeds and capacities far greater than what a disk or even an SSD system can achieve, and corresponding instant analytics for complex queries.
Ingest combined with analytics helps companies handle the foundational workloads necessary within the Internet of Things.
Myth 2: In-memory Solutions Don’t Keep Data Persistently
In-memory databases that integrate disks or SSDs use this storage to retain a persistent copy of the data. Should a server lose power, hence wiping the memory space, the data can be recovered quickly from disk.
Many in-memory solutions also support replicating data across nodes to protect for high-availability. Data are then persisted in two places as well as memory-resident in two servers.
The persistence of in-memory solutions, both in high-availability and recoverability modes, lets companies use these products in operational environments.
Once in operational mode, applications that service end users in real time are put to use, delivering experiences with up-to-the-second accuracy.
Myth 3: In-memory Solutions Require Putting the Entire Data Set in Memory
Older in-memory solutions may have required a memory footprint equal to the dataset size. But now, many in-memory databases offer the option to retain data on disk or SSDs in conjunction with memory. For example, all data resides on disk or SSDs, and then uses memory for caching. This allows for efficient retention and analytics of far larger data sets.
In-memory solutions that combine media types also reduce ETL, as short- and longer-term data can be stored within a single system.
When real-time and historical data are mixed, the richness of the analytics grows, allowing for more accurate modeling and predictions and, ultimately, more finely tailored results.
Myth 4: In-memory Databases are Limited to a Single Server’s Memory
As distributed systems became accessible to developers and enterprises, this myth dissolved. Today, it is possible to aggregate many low-cost servers into a single system, deployed in your data center or in the cloud. The clustering capabilities are accomplished with basic Ethernet networking infrastructure and resiliency built into the cluster.
Cloud instances now include anywhere from current 244GB at Amazon Web Services or 448GB at Microsoft Azure. Amazon pre-announced last October an upcoming 2TB memory instance. With active data sets often in the few hundred gigabyte to terabyte range, it becomes easy to scale memory capacity across just a few cloud instances.
With large memory footprints available, developers can build systems capable of rapidly scaling to large user bases.
Myth 5: In-memory Databases are Expensive
With DRAM prices dropping every year and an ongoing technology supply battle between server and cloud vendors, it’s a good time to be building new solutions. Whether companies choose to deploy in their own data centers or in the cloud, in-memory options are as affordable as ever. And with the option to distribute across low-cost servers, the pricing can be extremely cost effective for performance-driven workloads.
As pricing continues to drop for in-memory solutions, use will reach across more industries and a greater number of applications.
Comparing Today’s In-memory Offerings
In-memory computing crosses the application layer, the messaging layer with tools such as Kafka, the data-processing layer with technologies such as Spark, and the in-memory database arena with products like SAP HANA and MemSQL. There are other in-memory offerings from Business Intelligence vendors that typically act as a cache, and in-memory analytics options from conventional database vendors that store a copy of data in memory.
One frequent differentiating characteristic of in-memory solutions is how they treat memory. It can be roughly broken into the following categories:
Memory first. Memory-first architectures allow transactions to be captured in-memory as well as persisted to disk. But by allowing transactions to be captured with memory, ingest rates can cover orders of magnitude beyond what a disk-based system can accomplish. In-memory databases also can operate where every transaction is persisted to disk before being acknowledged.
Memory only. These systems make the assumption that all data is or should be in-memory. While they can provide phenomenal performance for active data sets, they do not leave room to unite real-time and historical data in a single system.
Memory after. Traditional databases that add in-memory options as an afterthought can drive some benefits of real-time analytics, but they suffer on ingest as each transaction must traverse a conventional disk path. This means that memory-after approaches are unlikely to suffice in high-ingest rate workloads, such as those for the Internet of Things.
Matching the Right In-memory Database to the Application
Perhaps the most interesting aspect of picking the right in-memory solution is understanding the shift in transactions across today’s digital economy.
In the early days of computing, transactions represented an immediate shift of money for finance and accounting, shifting inventory of goods, and movement of people in human resources. In each of these cases, the individual transaction was sacred and could not be lost under any circumstance.
This is still the case today, and it’s why many in-memory solutions build in persistence, high-availability, snapshots, backups, and remote replication.
However, mobile connectivity and the Internet of Things are spawning a new class of transactions from apps, sensors, and all kinds of metrics, in which the aggregate of the transactions is more important than any individual one. In these instances, companies need to push the limits of the technology to focus on high-volume data capture in memory and a simultaneous need to analyze that data in real-time. These capabilities are best served with a memory-first architecture.
Gary Orenstein leads marketing at MemSQL across marketing strategy, growth, communications, and customer engagement. Prior to MemSQL, Gary was the Chief Marketing Officer at Fusion-io, where he led global marketing activities. He also served as Senior Vice President of Products at Fusion-io during the company’s expansion to multiple product lines. Prior to Fusion-io, Gary worked at infrastructure companies across file systems, caching, and high-speed networking. Earlier in his career, he served as the vice president of marketing at Compellent. Gary holds a bachelor’s degree from Dartmouth College and a master’s in business administration from The Wharton School at the University of Pennsylvania.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.