I don’t think I am going out on a limb by saying that NoSQL is very powerful and flexible – and very useful – technology. By powerful, I mean that it can easily handle terabytes of data, can scale to billions of users, and can perform a million ops per second. By flexible, I mean that, unlike relational databases, NoSQL seamlessly handles semi-structured data like JSON, which is quickly becoming the standard data format for web, mobile, and IoT applications. And, NoSQL performs a real mission-critical service: It is an operational database that directly supports some of the largest applications in existence.
So given that, my question is, “While NoSQL use is growing very rapidly, why isn’t it everywhere yet?” Why isn’t NoSQL already the standard database for web, mobile, and IoT applications? Why are people still force-fitting JSON into relational databases?
The simple answer is because NoSQL lacks a comprehensive and powerful query language. And I get it because, really, how useful is big data if you can’t effectively query it? And how powerful and dynamic can your applications on NoSQL be if they can’t easily access and utilize the data?
NoSQL Needs an Effective Query Capability
The lack of comprehensive query for NoSQL databases is a main reason why organizations are still force-fitting JSON data into relational models and watching their applications creak and groan under the strain of skyrocketing amounts of data and numbers of users. Developers are wedging JSON data into tables at great expense and complexity so that they can retain the query capabilities of SQL/RDBMS.
If we could start from scratch, how would we design a query solution for NoSQL? What would it need to do?
I will admit that this is a long list, but that’s because we need to make sure we get this right. There already have been attempts at creating a query language for NoSQL, and all have fallen short. Either they miss core functionality that renders them ineffective or they are simply an API that can require hundreds of lines of Python to perform a simple lookup.
A query language for NoSQL must enable you to do the following:
Query data where it lies. There should be no requirement to manipulate data in order to query it. Your query language should work with the data model you create, not the other way around.
Understand the relationships between data. Without commands such as JOIN, you essentially would be forced to combine all your data into a single JSON document, losing valuable insight into the relationships between data.
Both reference and embed data. You should be able to choose the structure that is more applicable to your project, and not be forced into making and then maintaining copies of data in order to query documents.
Create new queries without modifying the underlying data model. No one wants to have to anticipate every query required prior to designing the data model. And no one wants to go back and alter the data model when query requirements evolve.
Support secondary indexing. Again, the query language must be flexible. You should be able to query and manipulate data where it lies, without the requirement to create and maintain multiple copies in multiple formats.
Avoid the impedance mismatch between the data format and the database structure that occurs when storing JSON in a relational table. Removing the complex translation layer would streamline application development. By extension, you need to process query results as JSON so that your application can consume the information directly.
And, perhaps most importantly, the query language for NoSQL must be easy for developers to use.
Developers want to focus on their application and would prefer everything else to just go away. They don’t want to learn a whole new language. They won’t adopt a whole new tool. Developers want a query language that is easy and familiar. And while the query language for NoSQL must provide value, it also must be simple, recognizable, and accessible via the tools and frameworks that developers already use. You know, kind of like SQL.
Why Not Just Use SQL for NoSQL?
Let’s explore that idea. Why might SQL work for NoSQL?
- SQL is powerful and yet simple – expressive and easy to read.
- SQL is flexible – able to fully express the relationships between data.
- SQL is proven – 40 years of history supporting the largest implementations of relational databases.
- SQL is known – millions of developers already use it either directly or through their development framework of choice.
- SQL has a whole ecosystem of tools and frameworks built around it – data is accessible via standard ODBC/JDBC drivers to enable seamless transfer and plug in to standard reporting and BI solutions.
The Ideal Solution for NoSQL Query Is SQL Itself
NoSQL is both powerful and flexible, but in order for NoSQL to become ubiquitous and used as the standard operational database for web, mobile, and IoT applications, it needs a powerful, flexible query language. And that query language must be SQL. It can’t be a token subset of SQL, it must enable the full power of SQL. And while there will be some required modifications to support the JSON data format, they must be minimal to enable adoption and reduce the learning curve.
Adding SQL to NoSQL is not the only requirement for NoSQL to become ubiquitous, but if we are able to marry the flexibility of JSON and the scalability and performance of NoSQL with the full power and familiarity of SQL, it will be a big step forward. As funny as it may sound, what NoSQL needs most right now is actually SQL itself.
Timothy Stephan leads the product marketing team at Couchbase. He previously held senior product positions at VMware, Oracle, Siebel Systems, and several successful startups.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.