As data volumes continue to expand in concert with the maturing Internet of Things, organizations are finding it more difficult to connect the dots. In the deluge of information created by the fast-growing number of data sources, data that arrive from disparate sources frequently end up quarantined in silos, making associations and relationships that exist between and among data all but impossible to recognize.
Graph databases’ ability to handle relationships better than SQL and NoSQL databases is making graphs the preferred choice for dating sites and social networking companies like Facebook and LinkedIn, which thrive on discovering how a person is connected other people or groups. Forrester Research estimates that more than 25 percent of enterprises will use graph databases by 2017.
Another key feature of graph databases is their ability to store unstructured data, whereas traditional relational databases primarily handle structured data.
Data Informed spoke with Jesse Shaw, a senior consulting software engineer at LexisNexis Risk Solutions, about graph databases and the insights that they can deliver. Shaw will present Buried Alive! Massive Graph Analytics in 20 Lines or Less at the Data4Decisions conference in Raleigh, NC, Wednesday. More information about the conference is available here. Register for the conference here.
Data Informed: What factors are driving the growing interest in graph databases?
Jesse Shaw: Graph databases allow a much finer-grained data model that matches more closely with the problems they are being used to solve. They provide a better real-world approximation of knowledge and a simplified query structure.
How do graph databases differ from traditional databases?
Shaw: A traditional relational database relies on a series of keys to link together tables that originally were designed to remove repeated information in order to save on storage costs as well as to leverage computing power. But because storage and processing power are nearly unlimited, the focus now is being put on data knowledge extraction, and a graph database facilitates this extremely well.
What is a key advantage of graph databases?
Shaw: A well-structured graph database is a clear model focused on efficient domain knowledge extraction. For example, say that you wanted to produce a list of automobile owners where the selection criteria were drivers under 21 years of age, who have owned at least two cars, in two different states. In that instance, a graph database structure would contain three entities, or nodes: a person, a car, and an address. The relationships would tie the person to the car and the person to the address. And this, therefore, creates an inferred relationship between the car and address.
With a relational database model, all three relationships would need to be defined, and the resulting nested JOIN structure would be complex. In the world of graph databases, on the other hand, the missing relationship would easily be inferred, with the query reading like a filter.
What sorts of insights do graph databases deliver that traditional databases struggle with or simply cannot deliver?
Shaw: Graph databases perform exceptionally well when delivering complex query results where answers are spread over many tables, especially when inter-table relationships have not been explicitly set. Recall the example of the automobile owner list I mentioned previously. This example highlights graph traversal through two explicit and one inferred relationship to answer the question posed. To further differentiate a graph from a relational database, consider adding the following condition to that query: for any input address, give me the addresses of former roommates who are under 21 years old and owned two or more cars in two different states.
How can an organization determine if a graph database is the right solution for its needs?
Shaw: While the majority of businesses rely on relational database solutions, graph databases are the next generation of BI/decision support systems. They provide a real-world data model and are capable of nuanced variable extraction. Due to this model, query syntax is simplified, moving logic creation away from software development and closer to product managers.
Scott Etkin is the editor of Data Informed. Email him at Scott.Etkin@wispubs.com. Follow him on Twitter: @Scott_WIS.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.