Open-source platforms have long been magnets for innovation and experimentation. From the early days of Unix through Mozilla, Linux, and the most recent Apache Foundation Hadoop distributions, open source has been the place to be for eclectic and sometimes groundbreaking development.
Despite this rich environment, even the most die-hard open source enthusiast has to admit that enterprise IT has been guided and largely shaped by innovations coming from the vendor community. Starting with the mainframe and the personal computer and extending to modern virtual and software-defined architectures – the major gains in IT development have come from the labs of companies such as IBM, Apple, Microsoft, VMware, and more.
The previously mentioned vendors’ platforms have helped companies streamline footprints, control costs, and improve data productivity. But in case you have missed it, there is a sea change underway, in which data platforms and services are now created, deployed, and consumed via open source. And the many open-source communities in existence today already are making major contributions to the worldwide data management ecosystem.
This is a “community-driven” technology revolution because for the first time, users of data platforms are setting the terms regarding the future of software development. And it just happens to coincide with the age of big data, the Internet of Things, collaborative workflows, and other initiatives that are forcing enterprises to think out of the box when it comes to redesigning and re-imagining their data stack.
Examples of this dynamic at work include Yahoo’s Hadoop cluster, currently the largest in the world at an estimated 450 petabytes. In addition, there is the Facebook Hive project that translated SQL-like queries into MapReduce so that data analysts can utilize familiar commands to manage and scale out Hadoop workloads and its successor, Facebook’s Presto system, for interactive SQL on Hadoop. And then there is LinkedIn’s Kafka, which enabled a distributed publish-subscribe messaging system, supporting functions such as event-flagging and data partitioning for real-time data feeds.
Once projects such as Hadoop, Hive, Presto, and Kafka were open sourced, it was the strength of their respective communities that both led to and continues to fuel innovation. For example, when the team at Yahoo opened up Hadoop as an independent, open-source project, it kicked off an era of big data management and allowed the broader open-source community to begin producing myriad solutions that were vital to enterprise productivity. Big data is now a gold rush as organizations of all kinds seek to mine the valuable nuggets that lie within their data stores.
And there is every reason to think this trend will only accelerate as the data ecosystem evolves along increasingly virtual, scale-out lines. Unlike proprietary vendors, hyperscale entities like Facebook and Google are not in the business of creating and selling software. So the solutions they develop and then release freely provide the community with a heightened level of knowledge, which in turn benefits the developers in the form of still further development and increased patronage of their cloud or social/collaborative products and services. It’s a symbiotic relationship that ultimately benefits the entire IT industry.
Vendor-driven innovation provides a steady and somewhat straightforward development path, while open-source initiatives tend to be a bit more chaotic, with new products coming at a rapid pace.
This is why knowledgeable systems consultants are crucial for the development of enterprise-class open-source architecture. No single platform or development community has all the pieces to implement end-to-end data architecture, and it is very unlikely that all the right components will be implemented by chance, so a solid integration partner is a critical asset as open platforms evolve.
This is an exciting time for open source. With data architectures rising and falling at a rapid pace on newly abstracted, scale-out infrastructure, the need for broad interoperability and cross-vendor compatibility is paramount. Community-driven initiatives have shown they can step up to the plate for big data and other critical needs, but ultimately it is the enterprise that needs to reach out and take advantage of the wealth of new tools emerging from these vibrant and energetic development efforts.
Ron Bodkin is the founder and CEO of Think Big Analytics. Ron founded Think Big to help companies realize measurable value from big data. Think Big is the leading provider of independent consulting and integration services specifically focused on big data solutions. Our expertise spans all facets of data science and data engineering and helps our customers to drive maximum value from their big data initiatives.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.