A new crop of high-performing relational databases are entering the market to reach out to millions of data professionals who are fluent in SQL. These new databases use the interactive querying language, while changing how transactional data is stored, to address performance limitations of traditional relational databases.
Competing with Hadoop and NoSQL, these new databases, such as NuoDB, Splice Machine, VoltDB, Cloudera’s Impala and Hadapt, are focused on scalability and predictability, with the ability to perform with distributed data in the cloud or on-premise.
Matt Aslett, the research manager for data management and analytics at the 451Research, said the new database market represents an unprecedented level of innovation.
Several new entries into the market are reimagining the transactional database to ensure ACID compliance, which stands for atomicity, consistency, isolation and durability. Those four traits guarantee reliable database transactions.
“Maintaining the predictable performance at scale is the inherent issue. Scalability can be achieved, distributed environments can be supported, but the relational database tends to fall down in terms of that,” Aslett said. “What is so interesting and exciting about this space right now: A lot of people are thinking anew about how to solve these problems, but with foundational SQL and ACID transactions, stuff that has been proven.”
With so many developers and data professionals familiar with SQL, and business intelligence tools and other applications written for SQL, the interactive query language isn’t going anywhere. Many new database creators are looking for ways to connect SQL to highly scalable distributed file systems such as Hadoop.
Barry Morris, CEO of NuoDB, said so many professionals know SQL that it makes more sense to create scalable and predictable databases using SQL than to teach every professional new programming and query writing skills.
“Every one of the global 2000 companies has got SQL everywhere,” Morris said. “They’ve got thousands of employees they’ve trained, they’ve got tools, and they’ve got business processes and applications. Everything is very dependent on SQL. If it’s good enough for that, don’t change it.”
NuoDB, which launched for general availability on Jan. 15, has targeted those SQL users by coming up with a new architecture for a relational, ACID-compliant database for the cloud or on-premise deployments. The database takes advantage of in-memory technology to avoid data caching, and it keeps the data close to the application layer to boost performance. The transaction layer and the storage engine are separated, so while the data is responding to interactive SQL queries, transactions are recorded to any number permanent storage nodes, either in the cloud or on-premise.
Traditional relational databases are like a library, where each patron must check out books at the front desk so a proper record can be kept of each transaction. According to Morris, NuoDB’s architecture is more like a sports contest, where different data key values are the players interacting on the field while permanent storage nodes are reporters in the press box keeping track of who scores, when and how.
Morris said the company calls this database architecture concept “emergence,” where each bit of data, called atoms, reacts together like a flock of birds moving together in flight. NuoDB holds several patents for its new emergent technology; Morris said the company built the new database from scratch with 12 rules for the future of cloud data management in mind.
“Let’s think about the Web 10 years out, when there are 50 billion devices and we’ve got billions more users, most of them on mobile,” he said. “When we’ve got free bandwidth, when your motor car and your television set are on the web, and there are going to be millions more applications. On the back end of that, there are going to be databases. Identify what the requirements of that system are, and then let’s compare that to what people are actually building, and we think that we’ve cracked it.”
Splice Machine also uses SQL to connect to a distributed file system, using Hadoop and HBase for storage for availability, scalability and fault tolerance, according to Monte Zweben, the company’s CEO.
The Splice Machine database is transactional and by using SQL, the database gains interactivity. While that’s key for data scientists looking to do iterative exploration, it’s even more important for new big data applications, he said.
“We believe there is new class of application that is emerging that is generating orders of magnitude more data than the previous generations,” Zweben said. “Your response time needs to be very quick in an application setting, and more importantly it needs to be transactionally safe.”
Zweben pointed to an e-commerce shopping cart, or a record of medical treatments, as an example of applications where data transactions have to be faithfully recorded.
Zweben said NoSQL databases that sought to fix performance issues without including SQL have “thrown the baby out with the bathwater.”
“The entire IT community has dedicated many, many years of application development and data analytics and business intelligence utilizing SQL,” he said. “There are many tools out there for it, and there are great deal of skillsets and organizations out there that are well trained.”
There are several other projects looking to connect SQL to distributed systems. Top Hadoop distribution Cloudera announced its Impala project, which uses SQL to query Hadoop in October. Hadapt also allows SQL queries on data stored on Hadoop.
The open source Apache project Drill is working on interactive queries for Hadoop.
Home page photo by Bernhard via Wikipedia.