Thanks to big data, database architects are rock stars again. New databases are emerging from dozens of young companies created to push the limits of price and performance for big-data applications. For data analytics pros, this fertile period in database technology means more options and potentially a more varied computing environment.
It wasn’t that long ago that the classic relational database management system, the anchor to many business applications, was the undisputed king. But as Internet giants Google, Amazon, and their ilk pushed the limits of web-scale computing, it became clear that giant relational databases running on pricey servers wasn’t effective or economic. Their moves to distributed architectures ushered in Hadoop and NoSQL databases for handling very large volumes of data that don’t need to conform to the neat rows and columns of a relational store.
Now, many databases startups are taking advantage of emerging technology to design a new crop of databases that target very specific use cases—the opposite of the one-size-fits all relational database approach. IDC database analyst Carl Olofson estimates that by the end of this year more than 20 companies will release new database products that didn’t exist two years ago.
Businesses, which have become more comfortable with non-relational databases in the NoSQL wave, are being drawn to these new technologies for the optimized performance in certain applications and lower costs.
“You can now do interesting things that you couldn’t do before and the economics have tipped the scales to the point where people say, ‘Wow, if I deploy that, it’d be really powerful,’” Olofson said. A bank, for example, could also use a new type of database to process stock ticker prices and run analytics in real time to ensure that a trade adheres to a portfolio’s rules.
Enterprises have had specialized databases in house for years, of course. A Teradata data warehouse could sit alongside back-office systems running on Oracle and SQL Server or MySQL for other applications. But the need to make sense of growing volumes of data and advances in servers, storage, and networking are creating openings for new databases.
Bringing Computational Power to Commodity Hardware
San Francisco-based startup MemSQL, for example, has built an in-memory database that can process large amounts of data rapidly and allow people to analyze that information in real time.
One of its customers is a large social gaming website that studies how games perform in real time to keep customers playing. It analyzes tens of millions of entries a second by comparing performance to different time periods, explained Nikita Shamgunov, the CTO and co-founder of MemSQL.
There are other specialized analytical databases available from companies, but MemSQL engineers saw a need for a database designed from the ground up to take advantage of falling memory prices. The architecture allows for thousands of queries per second and, since it’s both a transactional and analytical system, there’s no need for batch loading with traditional ETL tools, Shamgunov said.
Another startup created to take advantage of exponential improvements in hardware is Aerospike in Mountain View, Calif., formerly known as Citrusleaf. Its transactional database is optimized specifically for eight-core Linux servers with flash storage and applications that need to handle tens or hundreds of thousands of transactions per second.
“We saw that multi-core processor servers with flash storage was really what was changing at the hardware level and that presented a massive opportunity,” said co-founder and CTO Brian Bulkowski. The company plans to introduce tools for analytics around its database as well, he added.
Bringing this level of computation power to commodity hardware makes completely new applications accessible to businesses. A data-focused startup, such as one that analyzes huge volumes of social media comments, can launch on the backs of a few servers rather than more expensive hardware, said Bulkowski.
Other companies are optimized for deployment in the cloud. Metamarkets provides very fast analytics and visualization tools for businesses. The service, which uses a custom in-memory columnar database, runs entirely on Amazon Web Services. Customers either ship their data to Amazon’s data storage service or Metamarkets analyzes a stream of data sent from a customer’s source, explained CEO Michael Driscoll.
Many technology providers, both startups and incumbent tech companies, are converging around Hadoop and providing the services and tools to make it easier to use and more enterprise-friendly.
There’s even a venture fund called the Data Collective dedicated solely to big-data startups. “Silicon Valley is full of them. You can’t turn a corner without running into a big-data startup,” said Peter Goldmacher, a San Francisco-based analyst for Cowen and Company.
Innovations to Support Demanding Applications
For the most part, startups with new databases are serving early technology adopters who have the most demanding applications and have run into trouble with traditional databases. Some products are so advanced that mainstream businesses don’t yet have the use cases and in-house data science skills to take full advantage of them, Goldmacher said.
He added that business strategy expertise is also a factor. “Part of what’s hindering adoption is that you need guys to figure this stuff out. People have to have a higher level understanding of what the business is trying to support,” he said. “When they do, they come up with really innovative stuff.”
|A Selection of Innovative Database Players|
|Aerospike||High-volume transactional database optimized for eight-core Linux servers with flash storage|
|MemSQL||In-memory database that takes advantage of falling memory prices to eliminate extract, transform, and load (ETL) process|
|Metamarkets||Fast analytics service built on in-memory columnar database running in the cloud|
|Actian||Column-oriented analytical database optimized for speed on server clusters|
|Paradigm4||Analytical database designed for mathematical or technical computing on massive datasets|
|NuoDB||Shared-nothing database with peer-to-peer approach designed for scalable cloud applications|
As time goes on, the technology industry will likely converge around a handful of database types optimized for particular use cases, industry executives and analysts said. One type could fit the bill for applications that analyze a rapid flow of unstructured data, whereas some may be optimized for analyzing massive data sets. That means big companies may add one or two special-purpose databases to the ones they already have.
“We’re moving away from the one-size-fits-all database and what’s emerging is more of an ecosystem,” said MetaMarkets’ Driscoll.
To fit into existing corporate computing environments, database startups need to support standard protocols. MemSQL, for example, improves performance by converting standard SQL queries into C++ objects and the database can work with third-party MySQL-compliant clients.
Companies that work with cloud-based services for analytics don’t even need to worry about the choice of database since it’s a packaged service but, as with other software services, do need to manage service levels, security, and uptime, said Driscoll.
Consolidation among special-purpose database companies has already started. Greenplum, Vertica, Netezza, and Aster Data Systems, for example, each made analytical databases which were acquired by large companies EMC, HP, IBM, and Teradata.
Even with all the activity in databases and related tools, more work needs to be done, said MemSQL’s Shamgunov. He predicts future data stores will be able to handle all manner of data types, work with SQL and languages for unstructured data, and quickly deliver answers from thousands of terabytes of data.
“It’s a pretty exciting time. There’s plenty of work for people who understand data,” he said.
Martin LaMonica is a technology journalist in the Boston area. Follow him on Twitter @mlamonica.