Hadoop’s potential for cost-savings, scale, and speed has been touted by industry analysts for some time, but getting Hadoop to the point of payoff isn’t easy. As Gartner’s 2015 Hadoop Adoption Study points out, the top two roadblocks for adoption are obtaining the right skills and determining how to get value from the platform.
Even with these challenges, Hadoop adoption in the enterprise is hitting its stride. Hortonworks recently announced that its average annual net expansion rate was more than 100 percent, meaning many Hadoop enterprise customers are more than doubling their clusters on an annual basis. Cloudera reported that its base of enterprise subscription software customers grew by more than 85 percent in fiscal year 2015.
Clearly, thousands of companies have made their first bets on Hadoop pay off. They have gone from figuring out how to run Hadoop to figuring out how to profit from it. What does it take to make that success more ubiquitous?
With Strata + Hadoop World upon us in New York this week, we anticipate a slew of new product announcements in the Hadoop ecosystem. That makes it the perfect time to assess the maturity of Hadoop, to delve into the pros and cons of various applications such as BI architectural approaches, and to evaluate smart strategies for maximizing the ROI of your big data initiatives – and the data itself.
With that in mind, here are three questions you should have for every Hadoop tool and application vendor you talk to at Strata, and that they should be able to answer:
How Does Your Solution Increase the Value of Hadoop Data?
The answers to this question should address scale and access, essentially making Hadoop more than just a storage system. Business users and analysts are moving from “give me aggregates and samples of the data” to “give me access to all the data.” Analytic technologies that can scale from millions of data points to billions of data points can offer a huge competitive advantage. After all, billions of rows of data sitting in Hadoop mean little if only a handful of data scientists can extract insights from that data.
Any solution targeting Hadoop should have two basic attributes:
First, it should run natively on the Hadoop nodes and work directly with data in Hadoop. There is no longer any reason for moving data outside of Hadoop for processing or analysis. Having a vendor stack run natively on the Hadoop nodes lets you leverage the scale, data, and security infrastructure of your Hadoop cluster.
Second, any solution you consider should be able to leverage all of the data you have on your Hadoop cluster. With the processing power of modern platforms, there’s no need to settle for old-school technologies that limit you to aggregating and sampling data. Tools built for small data but used on big data limit users to no more than a surface layer view of the new information streams coming into their business. Most businesses are missing out on access to raw data that can reveal many layers of net-new insights. Your next Hadoop investment should focus on enabling more users to analyze more of the raw data as it streams into Hadoop.
Does Your Solution Make it Easier for Non-Technical Business Users to Access Data in Hadoop Directly?
For big data to pay off, it should be put to use across a range of users, from data analysts and data scientists, to frontline business users. Challenge vendors to answer how their solution will appeal to functions across your business team – from the CMO to developers to individual product managers.
Non-technical business users are looking for intuitive tools that help them visualize, analyze, and consume data-driven applications from data in the Hadoop store. Ask Hadoop vendors for solutions with applications and visualization tools that run natively on Hadoop and optimize their interfaces to Hadoop data for better data discovery and consumption.
Inquire about user-experience attributes such as drag-and-drop interfaces that will encourage business users to experiment and explore. The tools should be simple enough to use that non-technical users can feel comfortable exploring data and zeroing in on precise and granular data. Tomorrow’s Hadoop systems won’t just help users pump data from the lake; they will let users jump in and swim.
How Can Hadoop Help to Simplify the Complex Data Warehouse and BI Server Stacks in My Organization?
Every vendor should be able to give you a pitch about how their solution can democratize granular data access and cut costs by simplifying and converging the data stack.
In the past, most of the data that an organization tracked came from a small set of well-defined business processes. This transactional data was stored in proprietary data warehouses. The only option for BI tools in this scenario was to connect to these proprietary back ends via a thin SQL pipe, get results back, and plot them. Data consumers have become increasingly frustrated with legacy-era architectures and are coming to view open database connectivity (ODBC) and data warehouses as chokepoints for access to big data.
As the plunging costs of acquisition and retention are multiplying available data sources, Hadoop has become the de facto platform of choice for storing all of this data within a data lake. Hadoop essentially collapses the stack: in the era of big, constantly flowing data, it is too expensive and cumbersome to scale these traditional systems at the pace of digital information.
Monetizing this growing data lake should be one of the first things you hear about when vendors talk about Hadoop solutions. The vendor should help you take out the cost and complexity of building and executing end-to-end analytics exclusively within Hadoop. The data doesn’t need to be moved. Your vendors should be able to help you address this problem and deliver powerful analytics to a broader population of end users.
Like many disruptive technologies, interest in Hadoop has long been a conversation of “what if?” This year at Strata, the challenge is to ask “How?” Creating scale and access to Hadoop data, offering more user-friendly tools, and beginning to move away from legacy SQL stack are very much a reality for smart businesses.
David M. Fishman is Vice President of Marketing at Arcadia Data. He is responsible for Arcadia Data’s overall go-to-market strategy and execution.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.