Choosing an alternative to the traditional relational database can almost feel like ending a long relationship or choosing to root for the Yankees after growing up as a Red Sox fan. But businesses would do well to explore emerging database technologies, both for specific projects and with an eye towards the future.
The combination of Web-scale computing, powerful commodity hardware, and big data has ushered in an array of new products that are shaking up the database and analytics field. The result is far more choice, but also a potentially confusing environment for making technology decisions.
NoSQL, or “not only SQL,” databases have become viable alternatives to relational database, particularly for applications that store unstructured or semi-structured data. There is also a crop of new relational databases, sometimes called NewSQL databases, designed to run on many cheap servers, rather than on one single appliance or a very large server.
Right now, it’s mostly the leading-edge technology adopters who have jumped on the Hadoop and NoSQL wave, particularly for Web-facing applications with high volumes of data. eBay, for example, has been running the open-source distributed database system Cassandra for more than a year across dozens of nodes to enhance online shopping with more customized data for its users.
Implications of New Database Technologies for Analytics
But more mainstream IT organizations should have these emerging technologies on their radar screens because they represent a potentially major shift in analytics, analysts say. Instead of the relational database undergirding most enterprise applications, computing systems are becoming more mixed with multiple specialized data stores emerging.
“In the last year, we’ve seen a big shift toward an acceptance to look at alternatives to relational databases,” said Matthew Aslett, analyst at consultancy the 451 Group. “For new applications and new projects, particularly Web-facing ones that are perhaps not mission critical, organizations are looking at them as options for their next-generation database platforms.”
Relational databases—think of MySQL, Oracle, IBM DB2, Microsoft SQL Server, Teradata and Sybase—are mature, have a rich ecosystem of third-party products and vendor support, and all speak SQL so training is typically not an issue. So why bother look around?
One of the primary drivers is economic. Many of the newer products are open source and designed to run on clusters of commodity hardware. This means administrators can add more servers or storage devices to scale up, rather than buy a more expensive high-end server. Hadoop, after all, was specifically designed by Internet giants, led by work at Yahoo and research from Google, to run Web searches in data centers filled with racks of commodity servers.
But a powerful draw toward NoSQL databases is the flexibility they can bring over the relational model of storing data in rows and a predetermined number of columns.
Craigslist is a MySQL shop but it decided to pull the plug on MySQL for its archive database because of the trouble engineers had making changes to postings, according to engineer Jeremy Zawodny, who worked extensively with MySQL as an engineer at Yahoo.
Altering a table to, for example, change the number of photos in a posting was difficult and time consuming. The company decided to go with the open-source document database MongoDB because it offered more flexibility to make changes. The shift eases administration and frees people to work on improvements to the system, such as adding real-time analytics, Zawodny said in a video done by MongoDB.org
“There are parts of the other systems we have that may benefit from adopting more of a document model than a relational model. We even look at our main database and some of us squint at that and say, ‘Why is even that relational?’” he said. “It kind of opens up your mind to say, ‘What else can we improve by doing this?’”
Don’t Unplug That Relational Database
Growing interest in alternative databases does not necessarily mean unplugging relational databases. Instead, Hadoop and NoSQL databases will often perform a specific task, perhaps one that was wasn’t done before, in concert with existing systems. Companies could start to collect network or Web log data, for example, to improve the performance of their ecommerce application or analyze unstructured social media data.
“Many of these systems coexist where Hadoop acts as a landing strip for lower density data and then the data is put into a structured database or queried directly using something like HiveQL. Some typical use cases are around customer churn, fraud, product optimization, and capacity planning, but we’re seeing more each day,” said Tony Cosentino, an analyst at Ventana Research.
The right approach is to understand the business problem that needs solving and then look at which technology maps best, rather than surveying the bewildering array of options, analysts say.
Other important considerations are product maturity and the skills required to implement and manage new technologies. There are a number of companies formed to offer support and services around open-source products and a number of new products support SQL or a SQL-like language for queries and analytics.
For the most part, tying Hadoop and NoSQL databases into existing computing systems is not a major problem because there are connectors from Hadoop into all existing databases and support from established vendors, said Aslett. But more sophisticated applications, such as one that requires real-time data movement, will require custom work, he said.
The decision to go with a NoSQL database should be driven primarily by the data model and whether you will benefit from the flexibility to make changes over time, said Max Schireson, the president of 10Gen, which provides support services for MongoDB. Using a document database to store purchase orders, for example, makes it easier to later alter the required fields, or add a Twitter handle to documents that contain contact information. Learning a new product that fits the business problem is better than using a relational database for a task it’s not designed to do, Schireson argued.
Using a database, whether it’s relational or not, that is built to run on large clusters means IT organizations can support petabyte-size data sets. That means businesses can take on big data applications on a cheaper and more flexible hardware platform.
Moving to a different technology architecture inherently carries risks, such as lower-than-hoped performance or higher costs than anticipated, but the products and services are developing quickly. Because these technologies are maturing, more companies are willing to try alternative technologies for some applications, said 451 Group’s Aslett.
“There are a number of things coming together that mean people can really do this at a professional level rather than hack something together themselves,” he said.
Martin LaMonica is a technology journalist in the Boston area. Follow him on Twitter @mlamonica.