Big data is growing fast, and it isn’t going to slow down. Companies across a broad spectrum of industries are using data to govern decision making at every turn, from personnel decisions to marketing and information management. As the caches of data continue to grow, the need for agile management systems grows alongside.
We are collecting data from a gamut of sources, including social networks, mobile devices, system data, and, increasingly, the burgeoning Internet of Things. Current database management systems are struggling to keep up with the demands of the growing volume of data and are sorely in need of a re-evaluation. To achieve the next level of database management, we need to ask ourselves, “What if we’ve been looking at databases wrong this entire time?” What if there is an alternative perspective – an agile perspective – that would open up a veritable well of new ideas about how we develop and use databases?
Traditionally, we’ve looked at databases as static repositories: collect data, input data, retrieve data, by any means necessary. Developers and architects start with a modeling process to meet the early requirements, and queries are straightforward as the schema addresses the first features of the application. In the early phase, these features are easy to implement.
Inevitably, however, the data structure becomes more complex over time, requiring new features, which adds to the big data challenge. As the application is relied on to do more and increasingly complex tasks, the database grows in complexity as well. Invariably, this makes the database far more difficult to work with and often results in severe performance degradation.
To avoid a decline in performance, developers often resort to using multiple database engines for a single application, using one or more new engines that support performance and data requirements more closely. This adds even more complexity to the environment and an additional burden on the application developer – now multiple DBMS engines must be kept in synch from the application code. While this can be an effective workaround, it remains just that: a workaround. As more and more engines are added, things become more complex, forcing developers to interact with multiple data structures via multiple languages.
Sharding, which segments data across an engine by using a key, is a common technique to improve database performance. It can be effective, but when data is partitioned by key, it means that many operations cannot be performed on a single shard. This requires multiple shards and, again, increases both complexity and latency.
These issues are commonly understood, and while portioning can indeed help solve the problems temporarily, it has some effects on the database structure that aren’t as obvious. For instance, a key can support a certain subset of application requirements, but it makes other requirements cumbersome and reduces performance due to increased latency.
With this in mind, let’s take a step back. If we consider the possibility that we have been looking at databases incorrectly, we can begin to consider alternatives that can result in meaningful improvements. One such alternative is the concept of an agile big data approach. An agile big data approach is a completely new perspective of database infrastructures. It takes the traditional view – databases as static repositories – and instead looks at databases in real-time views and dynamic streams. Agile big data allows developers and managers to view data as it comes in rather than waiting for complete compilation before beginning the modeling process. Views are built and maintained by the agile infrastructure itself, on an incremental basis in real time. In other words, views are a picture of the data as it happens. Views can be structured as indexes on other data, data in specific formats for easy application access, aggregate data, and external data engines such as data warehouses. The key is that all types of views are built and maintained from the dynamic, real-time stream of data. You can then build dynamic views with that data – views that exactly match given application requirements.
Anyone using data to guide decision making, analytics, or organization understands the growing need for real-time streaming and specific views to mirror requirements. And because these mirrors can be manipulated in a step-wise fashion, queries become simple, and answers and meaning can be obtained now, in real time. Turning a static repository into a dynamic stream allows us to take virtually any data source – applications, files, existing databases – and quickly solve many problems, such as the aforementioned high latency or exponential complexity birthed by too much portioning, increasing the capabilities of the infrastructure.
Rather than relying solely on separate engines or shards, the agile approach relies on real-time streams and views – views that exactly match application requirements. These views can be structured as indexes on other data, data in specific formats for easy application access, aggregate data, and can even exist in external data engines such as data warehouses. Further, an agile big data infrastructure can be implemented in existing infrastructures without changing existing database systems. For example, it is possible to connect to existing database technologies (e.g., MySQL, Oracle, MongoDB), and “tap into” their transaction logs or supported Change Data Capture capabilities. These streams can then be used just like any other source (files, direct API access, network streams, etc.), pushing data through the infrastructure as a real-time flow. The streams can then be used to manipulate data into views that can be queried, exposing the exact elements needed by given features of an application. An agile view of big data provides an adaptable, versatile infrastructure that has the ability to evolve rapidly along with application requirements, delivering on the promise of true business agility in the enterprise.
Cory Isaacson is the CEO/CTO of CodeFutures Corporation. Cory has more than 20 years’ experience with advanced software architectures and has worked with many of the world’s brightest innovators in the field of high-performance computing. Cory has spoken at hundreds of public events and seminars and helped numerous organizations address the real-world challenges of application performance and scalability. In his prior position as president of Rogue Wave Software, he actively led the company back to a position of profitable growth, culminating in a successful acquisition by a leading private equity firm. Cory can be reached at email@example.com.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.