You can’t track the evolution of big data analytics without paying attention to the rise and spread of Hadoop. An open source project of the Apache Software Foundation, Hadoop was developed as a framework to perform computations on large datasets using the MapReduce framework for parallel computing. The combination allows for performing computing functions on large data volumes using open source software and low-cost commodity hardware.
Since its debut in 2006, Hadoop has moved from Internet companies like Yahoo to projects and experiments at enterprises looking to collect data from new sources like social media and machine sensors. And an ecosystem has developed with both startups and established vendors providing support and advice for starting new projects and working to marry new technologies to established enterprise systems.
“Hadoop is not a niche technology. It’s the general purpose platform for big data,” said Doug Cutting, a co-founder of the Hadoop project who is chief architect at Cloudera, the first to bring Hadoop to enterprise users.
Below, find highlights from Data Informed’s coverage of Hadoop, including technology trends, use cases and interviews with technology experts.
Companies like IBM, Oracle and Teradata have had to move past their very successful model of consolidated data storage. They’ve been pushed by customers who need the flexibility to deal with unstructured data and a fast iterative process for quicker results. And Hadoop is at the center of the trend. Read more.
Hadoop is a collection of open source projects, combined to enable a software-based big data appliance. This article introduces a core aspect of Hadoop’s utilities, the Hadoop Distributed File System. Read more.
This article introduces the concepts behind the growing popularity of NoSQL in the development of analytics applications. Read more.
This article examines the prototypical big data platform using Hadoop, and how Pig, Hive, HBase, Zookeeper and Mahout address these pieces of the puzzle. Read more.
Shaun Connolly, vice president of corporate strategy at Hortonworks, talks about the development of Hadoop and what it means for enterprises using it, or considering using it for data-intensive projects. Read more and listen.
Related Podcast: A Look Inside the Hadoop Ecosystem with Hortonworks
Jack Norris, the chief marketing officer of MapR Technologies, discusses Hadoop security and the kinds of use cases that open up when it is improved. Read more and listen.
A conversation with MapR CEO John Schroeder, who works to make Hadoop plus its own technologies palatable to the most demanding commercial enterprises. Read more.
Cloudera, one of the veterans in the Hadoop field, introduced an open-source, real-time query engine for the Hadoop Distributed File System called Impala. Read more.
Hadoop pioneer Raymie Stata, the founder and CEO of Altiscale, talks about his work building a cloud-based Hadoop service. A former chief technology officer at Yahoo, he and his colleagues worked with the earliest versions of Hadoop to support Web search, advertising and email services. And earlier in his career, he was part of the AltaVista team at Digital Equipment Corp. Read more and listen.
Ankur Gupta, general manager of MetaScale, discusses the lessons—and empathy—that he has developed for IT shops in his four years of working with Hadoop and other technologies in a 1,000-node cluster that works in conjunction with an enterprise data warehouse, mainframes and other longstanding IT assets. Read more.
Syncsort, a software company founded in 1968 to help mainframe systems administrators sort data for transactions processing, has found new life at the center of ETL processes that enterprises with legacy systems require to adopt Hadoop and big data analytics systems. Read more.
With so many developers and data professionals familiar with SQL, and business intelligence tools and other applications written for SQL, the interactive query language isn’t going anywhere. Many new database creators are looking for ways to connect SQL to highly scalable distributed file systems such as Hadoop. Read more.
As tech giants continue to develop their in-memory offerings, smaller players like ScaleOut and GridGain are also helping to bring greater attention to these innovative alternatives to traditional data processing technologies. All of which could bolster Hadoop as a mainstream option for the enterprise. Read more.
Among the highlights of IBM’s announcements at its 2013 Information on Demand conference were technologies designed to boost performance of its data platforms and the security of systems using the open source Hadoop Distributed File System. Read more.
SAP is making a strong push into the big data marketplace with a flurry of new offerings, including an analytics software bundle that yokes its BusinessObjects business intelligence solutions with Sybase IQ, its high-speed analytics database. The company also announced it has built bridges between its database-bound systems and the popular Hadoop open-source distributed file system. Read more.
Driven by demand from customers wanting to sift unstructured textual data from social networks, machine-to-machine data from server logs and other non-standard data sources, Microsoft has spent more than two years trying to add big-data storage and analytic capabilities to the company’s existing database products. Read more.
The two requirements of big data projects–real-time information from a continuous flow of data and the analysis of immense data volumes supported by Hadoop–are generally at odds, writes Rajat Jain of Qubole. In this opinion article, he asserts that the Lambda Architecture represents a two-layered processing approach to address this problem. Read more.
In September 2013, Teradata announced the latest version of its core offering, Teradata Database 14.10. It promises performance improvements in terms of speed, analytical power and smoother access to data stored in Hadoop. Teradata’s Chris Twogood discusses the issues involved. Read more and listen.
Hadoop may be built for big data but it’s not known for supporting the slicing and dicing of real-time information. Now a number of companies are working to change that by connecting in-memory technologies to the popular open source distributed file system. Read more.
Intel, a company famous for its computer processing hardware, has jumped into the Hadoop software business, asserting that it wants to spur growth of big data analytics deployments in large data centers. Read more.
Pivotal Analytics Workbench is an example of a sandbox technology that allows users to test and tweak analytics algorithms. Read more.
Faced with rising demand for applications to analyze this data for clients, Lotame, a company that serves publishers and marketers, found a set of development tools that helped the company get past a shortage of developers with Hadoop skills. Read more.
A data lifecycle management framework for Hadoop, Falcon simplifies data management by allowing users to easily configure and manage data migration, disaster recovery and data retention workflows. It is a project developed by Hortonworks and mobile ad network InMobi. Read more.
Pattern is an open source scoring engine designed to ease and expedite Hadoop deployments for analytics projects. Read more.
Big data vendors such as DataStax are stepping up on Hadoop security, offering sophisticated built-in features. Read more.
Professionals attending the Strata Hadoop World conference in February 2013 noted that discussions about Hadoop had matured beyond the potential of the technology to focus on business use cases. Read more and listen.
The technology at the heart of Hadoop, a distributed file system that’s scanned in the batch computing process MapReduce, can’t deliver insights on data while it’s collected. It’s clear there is plenty of brain power pointed towards that problem. Read more.
Lingual is an open source SQL engine that runs on top of Cascading, a framework for executing data processing on a Hadoop cluster. It is designed to let data scientists and developers with basic SQL skills build applications on Hadoop without any training in MapReduce. Read more.
Opinion: Why Big Data Projects Fail
When approaching big data, not nearly enough emphasis is being placed on value. For this reason, too many big data projects are being undertaken using technologies like Hadoop without the kind of results that are possible, writes Stephen Brobst of Teradata in this opinion article. Read more.
A growing number of vendors are offering tools to address Hadoop’s batch-by-batch approach to data processing, including Cloudera’s Impala, the open source project Storm and Metamarkets’ Druid-based data engine. Read more.
WANdisco, a U.K.-based software provider, believes that its data replication technology can address a single point of failure in the Hadoop Distributed File System. Read more.
Riot Games, developer of the popular League of Legends online multiplayer video game, uses the Platfora BI suite to query data stored in Hadoop. Read more.
Chris Poulin and his company, Patterns & Predictions, had developed a commercial Bayesian analytics tool for predicting events—most notably financial events—based on historical analysis. He’s been turning that capability to the issue of veterans’ at risk of suicide with an application that is supported by Hadoop. Read more.
More data and simple algorithms work because having more data allows the “data to speak for itself,” instead of relying on unproven assumptions and weak correlations, writes Garrett Wu of WibiData in this opinion piece. Hadoop is one of the technologies that makes this approach possible. Read more.
Far from the big industry convention halls, grassroots Hadoop user group meetings are now a place where developers can openly swap trade secrets with competitors, land a job without a resume and listen to top companies confess their open source failures. Read more.
A May 2013 review at Dice.com found Hadoop skills in hot demand. Read more.
In March 2013, it was clear that d\Demand for IT professionals with Hadoop and NoSQL skills were a small but fastest-growing segment for job-seekers. Read more.