Popular consumer web properties such as LinkedIn, Facebook, and Google all use proprietary versions of graph technology. They pioneered its use to help continuously deliver relevant information through easy-to-use interfaces, while continuing to astound and amaze with new features and functionality at a rate unmatched by traditional enterprise-class applications. So it comes as no surprise to see the increased use of graphs in a new generation of enterprise data-driven applications.
A Graph Database Primer
Examples of popular graph databases include Titan and Neo4j. They belong to the NoSQL class of databases, which as a group is gaining in popularity.
Allied Market Research predicts that the global NoSQL market will reach $4.2 billion by the end of 2020, experiencing a growth rate of 35.1 percent CAGR from 2014 – 2020.
In addition to graphs, the NoSQL family of databases includes key/value databases (e.g., Amazon DynamoDB, Riak), document stores (Couchbase, MongoDB), and columnar databases (Cassandra, HBase). One differentiating factor that distinguishes NoSQL databases from traditional relational databases (Oracle, IBM DB2) is the ability to to askew the definition of a schema prior to storing data. Relational databases require data model design prior to being able to load data. This usually is called schema-on-write. The creation of a data model usually is the responsibility of a database architect, who takes great pains to design header/detail tables with foreign keys and associated reference tables in entity relationship diagrams.
In contrast, NoSQL databases are schema-on-read, where the data can be loaded without the need to define a data model.
Not having to spend time defining a rigid data model or make changes to an existing structure allows NoSQL database users to be agile and responsive to business needs. When new functionality is required of applications based on traditional relational databases, over-worked IT teams can take days or weeks to fulfill such requests.
Another benefit of NoSQL databases is their ability to store any type of data, whereas relational databases mainly store structured data.
If you have never seen or used a graph database, its simplicity belies its usefulness and power. Information can be captured in the form below:
|Entity (aka subject)||Relationship (aka predicate)||Entity (aka object)|
|Dr. John Smith||Is friends with||Dr. Jane Jones|
Entities form nodes in a graph that can be continuously added, with relationship connections linking together other nodes. As you can see, the example graphs model real-world relationships in a manner much more efficiently and easy to understand than relational databases, which require schemas that conform to how typical columns and tables are stored. You can describe an unlimited cadre of entities and associated relationships, connecting people to people, people to products, people to organizations, and more. Graphs typically are the go-to database for social network analysis (SNA), making it simple to discover how a person is connected to another within a group or organization.
Life Sciences Leads the Way in Graph Database Use
In the life sciences industry, the use of graphs allows sales and marketing teams to gauge the ultimate influence of physicians among their peers and their affiliations with hospitals and other institutions. Through this they can select those that play an important role in determining which drug or device receives approval or preferential placement among competitors in a healthcare organization.
Another life-sciences scenario involves the use of graph technology in an application to support the allocation of training of outward-focused sales and account teams to comply with federal health care programs and FDA requirements. Such an application not only yielded significant savings and improved productivity by proactively identifying which employees did not have to take training, but also boosted morale while producing the right reports for government agencies to meet regulatory requirements.
In fact, the life-sciences industry has been quick to adopt graph technology and data-driven applications for a variety of scenarios, such as sharing data stored in graphs seamlessly across groups, uncovering and storing relationships across a wide variety of data sets.
In other industries, such as oil and gas, the relationships between an oil well, its components, and land and lease rights can be used to help optimize management and, ultimately, operational efficiency.
Graphs lend themselves to fast and efficient analytics using technologies such as Apache Spark. Key influence often can be identified by the shortest connection points, as well as by the highest degree of connections. In addition, more sophisticated analysis can be derived from the weightings of specific types of connections.
Why the Use of Graphs Has Been CAP-ped
Critics of graphs and other NoSQL stores cite that relational databases are always consistent and have atomicity, consistency, isolated, and durability (ACID) properties, whereas NoSQL databases are mostly eventually consistent, or suffer from Eric Brewer’s theorem, which states that you can have only two out of the three: consistency, availability, and partition tolerance (CAP). Additionally, groups that fervently believe that integrity must be enforced at the database level have raised these concerns and are fervent in their loyalty to relational databases.
There also is the little matter of using actual structured query language (SQL) syntax for queries, the popular mode of data access for all relational databases. NoSQL actually doesn’t mean SQL syntax cannot be used. In fact, there is a whole movement around providing SQL access on top of NoSQL databases (Impala, Hive).
Another common complaint is that graphs often become visually unwieldy as the number of nodes, relationships, and edges grows. This is where data-driven applications come in, delivering both mobile- and browser-based interfaces that allow frontline business users to understand the data better and to collaborate with their colleagues in real time.
Many off-the-shelf graph databases are also not horizontally scalable, limiting options when used for storing and serving up big data volumes. This is leading vendors to offer modern data management platforms built on columnar-graph hybrid stores, and other forms of polyglot persistence. And this trend continued at the recent DataStax Cassandra Summit.
Reliable Data Sets the Foundation for Data-driven Apps
These platforms also include master data management (MDM) built-in to ensure the reliability, integrity, security, and governance of the data so the right entities and objects are matched across hundreds of systems and devices. In turn, graph technology can supercharge master data management.
Once scalable, reliable data management is in place, data-driven applications can provide an interface to frontline-business users hungry for agile functionality that focuses on their day-to-day goals. These applications offer relevant insights and recommended actions. Much in the way LinkedIn can identify the best path to connect to individuals of interest through first-level connections and suggest relevant jobs based on the user’s title and work history, enterprise data-driven applications can help business users accomplish their goals faster.
Graphs and NoSQL databases are here to stay as part of a modern blend of capabilities using a polygot persistence strategy. Companies know that traditional or legacy databases and methods of access aren’t going away soon, and this combination provides the flexibility they need to stay competitive in a dynamic market.
Anastasia Zamyshlyaeva is VP of Platform, Product Management at Reltio, where she is responsible for the strategy and roadmap of the Reltio modern data management platform. She has extensive experience in big data and enterprise software architecture, including the design of core components of the Informatica Master Data Management tool. Anastasia holds a MS in Computer Science from Chelyabinsk State University where she conducted research in the area of distributed graph computing.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.