Database

Graph Databases and the Connections in Data [Updated]

As the NoSQL sector continues to attract attention, graph databases are generating real and lasting excitement. In fact, interest in this sector has grown by a whopping 500 percent in the last two years alone. Forrester Research has reported that graph databases – the fastest-growing category in database management systems – will reach more than 25 percent of enterprises by 2017.

Despite their market momentum, some people still consider graph databases to be mysterious. But graph databases use intuitive principles that are similar to tasks we perform on a daily basis. Relational database management systems, on the other hand, have a comparatively steep learning curve. If you have ever worked out a route on a mass transit map or followed a family tree, you have manually run your own graph-based query.

In fact, it’s likely that you have come across a product or service powered by a graph database within the last few hours. Many everyday businesses have created new products and services and re-imagined existing ones by bringing data relationships to the fore. That’s because graph databases model, store, and query both data and their relationships, which is crucial for next-generation applications that feature use cases such as real-time recommendations, graph-based search, and identity and access management.

For example, Walmart, which deals with almost 250 million customers weekly through its 11,000 stores across 27 countries and through its retail websites in 10 countries, wanted to understand the behavior and preferences of online buyers with enough speed and depth to make real-time, personalized, “you may also like” recommendations. By using a graph database, Walmart is able to connect masses of complex buyer and product data quickly to gain insight into customer needs and product trends.

Zephyr Health, a San Francisco-based software company offering a data analytics platform for pharmaceutical, biotech, and medical device companies, sought to enable customers to unlock more value from their data relationships. Doing so would enable pharmaceutical companies, for example, to find the right doctors for a clinical trial by understanding relationships among a complex mix of public and private data such as specialty, geography, and clinical trial history.

Old-school SQL databases were not up to the task. Traditional SQL databases don’t handle data relationships well, and most NoSQL databases don’t handle data relationships at all. Nor are they well equipped to handle data that’s always changing – such as streams of new information coming in from doctor’s surveys.

Zephyr turned to a graph database for its capability and scale. Graph databases are designed to easily model and navigate networks of data with extremely high performance.

To fully appreciate the value of the graph, consider that early adopters of graph databases such as Facebook and LinkedIn became household names and unrivaled leaders in their sectors.

A “graph” can be thought of like a whiteboard sketch: When you draw on a whiteboard with circles and lines, sketching out data, you are drawing a graph. Graph databases store and process data within the structure you have drawn, providing performance advantages and making it easy to evolve the data model.

The Seven Bridges Puzzle

Far from being a recent data handling development, graph theory is nearly 300 years old and can be traced to Swiss mathematician Leonhard Euler. Euler was looking to solve an old riddle known as the “Seven Bridges of Königsberg.” Set on the Pregel River, the city of Königsberg included two large islands connected to each other and the mainland by seven bridges. The challenge was to map a route through the city that would cross each bridge only once.  Euler realized that by reducing the problem to its basics, eliminating all features except landmasses and the bridges connecting them, he could develop a mathematical structure that proved such a walk was impossible.

Today’s graphs are based entirely on Euler’s design – with land masses now referred to as a “node” (or “vertex”), while the bridges are the “links” (also known as ‘relationships” and “edges”). With graph databases, end users do not need to know anything about graph theory to experience immediate practical benefits.

Everyday Use

Graphs are a vital part of our online lives, powering everything from social media sites – including Twitter and Facebook – to the retail recommendations on eBay. Online dating also owes much of its success to the way graphs can analyze even the most complex relationships, looking not only at location and personal details, but also passions, hobbies, and attitudes, and relationships between all of those things, to identify potential matches. In addition, enterprise efforts in fraud detection, master data management, and network and IT operations are vastly improving thanks to relationship-based insights rooted in graph database usage.

Interest in the graph will continue to grow. The real-time nature of a graph database makes it an excellent platform for unlocking business value from data relationships that simply can’t be identified using traditional SQL or most NoSQL databases. The uses and applications for graph databases seem endless, and it’s exciting to consider what innovations they will continue to power as the world unlocks the value of data relationships.

Emil Eifrem is CEO of Neo Technology and co-founder of Neo4j, the world’s leading graph database. Before founding Neo, he was the CTO of Windh AB, where he headed the development of highly complex information architectures for Enterprise Content Management Systems. Committed to sustainable open source, he guides Neo along a balanced path between free availability and commercial reliability. Emil is a frequent conference speaker and author on NoSQL databases, and tweets at @emileifrem.