For technology developers, administrators, and users, profound change is in the air. There is a strong need for brand new languages and tools that can facilitate advanced data analytics and logically process streaming data.
In the past, batch processing sufficed for working with large data sets, but today, organizations are working with unbounded data sets and cannot tolerate the latency and lack of real-time functionality that come along with old, batch-centric tools. Today’s IT administrator juggles a kaleidoscope of software applications, devices, operating systems, and tools. The administrator deals with staggeringly larger data stores than ever before and, increasingly, streaming data flows.
Traditionally, administrators worked with simple tools and languages that allowed them to create workable scripts. As the data stores and platforms that they threw these scripts at proliferated, imaginative tools like MapReduce were created.
Now, though, in the age of big data and streaming data, a new generation of tools is needed, new languages are required, and new user interfaces that can put streaming data analytics within reach of non-technical users are missing.
Businesses Need Better Tools
Organizations everywhere are now working with large data sets in real time and need better tools for drawing insights from their data. Businesses are now pressured to deliver and deploy software faster, in more agile ways, to more environments. Additionally, they are pressured to quantify in real time the impact on technology infrastructure that new software deployments have.
New tools must be able to find simple and complex patterns within data and instantly reveal how all parts of software operations are working together. For example, simple instructions should reveal how a code deployment is affecting an organization’s system performance, and log error codes. In many cases, organizations will look for “another application” that may get them a taste of these expressive analytics, but they may be selling themselves short. Most of these applications have their own data model, schemas, and requirements that might not fit the way that particular business works. Using a language that simplifies the setting up of expressive analytics – operations such as windowing and aggregation – can help organizations get more meaningful views of their data, tailored to their own environment. This allows data-savvy team members to create analytics that serve the business better and enable the rest of the team to see the data that matters to them.
Beyond these needs looms the enormous specter of streaming data. The streaming model allows organizations to work with unbounded data sets, which can produce enormous advantages for them. Among these advantages, streaming data enables organizations work with more timely data and avoid unnecessary latency. And processing data as it arrives distributes workloads more evenly, producing more predictable resource consumption. But working with streaming data is a challenge today and, potentially, a staggering problem tomorrow.
The growing interest in streaming data is reflected in a recent Spark User Survey, which revealed that there were 56 percent more Spark streaming users in 2015 than in 2014. Querying and optimizing streaming data also calls for advanced analytics and graph processing tools, and the survey showed that the production use of advanced analytics, such as MLib for machine learning and GraphX for graph processing, increased from 11 percent in 2014 to 15 percent in 2015. Additionally, the survey results – as well as recent market research from firms such as Gartner –show that organizations are wrestling with the complexity of Spark and related tools.
Next-Gen Tools Arriving
Big data needs a new language and new tools that handle streaming data intelligently. Many of the new tools focused on streaming work with graphs and visual data flows to tackle these problems. In a recent interview, Rich Wolski, inventor of the Eucalyptus cloud computing platform and a U.C. Santa Barbara computer science professor, discussed the promise of streaming and graphical analytics of data flows.
“I think there are good questions to ask about MapReduce, and there are good questions to ask about what is referred to as ‘batch’ processing,” Wolski said. “There are ways to do the computation and analytics that are graphical, that use a different kind of internal representation that basically leverages a mathematical graph. Some people in the analytics community may get a lot out of these. Lots of people don’t really know yet exactly what they can do with a streaming, distributed analytics model.”
The data analytics and developer communities already are using tools ranging from SQL to Hadoop to Spark to arrive at better insights from their data stores. But streaming has its own unique model of computation that calls for new systems and tools that leverage flow graphs for expressing processes. These graphical models, along with new APIs and languages, can help developers and administrators better manage data and processes.
New languages and platforms can also integrate with new user interfaces so that even non-technical users can query and yield insights from data stores without building complex queries.
Despite much new interest in streaming from businesses, the majority of streaming systems that do exist remain somewhat immature compared to batch-centric tools. This is giving rise to brand new streaming platforms. The current state of tools for data analytics will continue to change, especially as the streaming model becomes more entrenched. Organizations are now struggling to work with streams of data that quickly will become torrents. New languages and platforms capable of true real-time analytics and imbued with advanced querying capabilities will become the new normal.
Apurva Dave is the VP of marketing for Jut, the operations data hub for DevOps. An experienced marketing leader, Apurva leverages his technical background with revenue marketing strategies to create, deliver, and scale-out messages; create new product categories; and guide an organization to successfully bring their products to market via digital and offline marketing, cloud partners, channels, analysts, press; and by partnering with sales.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.