Focus On: Hadoop

by   |   February 3, 2014 4:46 pm   |   0 Comments

You can’t track the evolution of big data analytics without paying attention to the rise and spread of Hadoop. An open source project of the Apache Software Foundation, Hadoop was developed as a framework to perform computations on large datasets using the MapReduce framework for parallel computing. The combination allows for performing computing functions on large data volumes using open source software and low-cost commodity hardware.

Related Stories

Focus On: Marketing Analytics
Read the story »

Focus On: Predictive Analytics
Read the story »

Focus On: Health Care Analytics
Read the story »

Focus On: Operational Analytics
Read the story »

Focus On: Location Analytics
Read the story »

Focus On: Analytics Skills
Read the story »

Focus On: Privacy
Read the story »

Focus On: Retail Analytics
Read the story »

Since its debut in 2006, Hadoop has moved from Internet companies like Yahoo to projects and experiments at enterprises looking to collect data from new sources like social media and machine sensors. And an ecosystem has developed with both startups and established vendors providing support and advice for starting new projects and working to marry new technologies to established enterprise systems.

“Hadoop is not a niche technology. It’s the general purpose platform for big data,” said Doug Cutting, a co-founder of the Hadoop project who is chief architect at Cloudera, the first to bring Hadoop to enterprise users.

Below, find highlights from Data Informed’s coverage of Hadoop, including technology trends, use cases and interviews with technology experts.

The Evolution of the Enterprise Data Warehouse, Starring Hadoop

Companies like IBM, Oracle and Teradata have had to move past their very successful model of consolidated data storage. They’ve been pushed by customers who need the flexibility to deal with unstructured data and a fast iterative process for quicker results. And Hadoop is at the center of the trend. Read more.

Understanding the Big Data Stack: Hadoop’s Distributed File System

Hadoop is a collection of open source projects, combined to enable a software-based big data appliance. This article introduces a core aspect of Hadoop’s utilities, the Hadoop Distributed File System. Read more.

An Introduction to NoSQL Data Management for Big Data

This article introduces the concepts behind the growing popularity of NoSQL in the development of analytics applications. Read more.

How Pig, Hive and Zookeeper Build Apps on Hadoop and MapReduce

This article examines the prototypical big data platform using Hadoop, and how Pig, Hive, HBase, Zookeeper and Mahout address these pieces of the puzzle. Read more.

Podcast: What Hadoop Version 2 Means for Business

Shaun Connolly, vice president of corporate strategy at Hortonworks, talks about the development of Hadoop and what it means for enterprises using it, or considering using it for data-intensive projects. Read more and listen.

Related Podcast: A Look Inside the Hadoop Ecosystem with Hortonworks

Podcast: Hadoop Use Cases Grow with Added Security for Open Source System

Jack Norris, the chief marketing officer of MapR Technologies, discusses Hadoop security and the kinds of use cases that open up when it is improved. Read more and listen.

Related: A Look at Hadoop’s Future with MapR’s John Schroeder

A conversation with MapR CEO John Schroeder, who works to make Hadoop plus its own technologies palatable to the most demanding commercial enterprises. Read more.

Related Podcast: Bringing Interactive Queries to Hadoop and the Skills to Go With Them

Cloudera’s Impala Offers First Step in Real-Time Analytics for Hadoop

Cloudera, one of the veterans in the Hadoop field, introduced an open-source, real-time query engine for the Hadoop Distributed File System called Impala. Read more.

Podcast: Raymie Stata’s Voyage from AltaVista to Yahoo, and Now a Hadoop Service

Hadoop pioneer Raymie Stata, the founder and CEO of Altiscale, talks about his work building a cloud-based Hadoop service. A former chief technology officer at Yahoo, he and his colleagues worked with the earliest versions of Hadoop to support Web search, advertising and email services. And earlier in his career, he was part of the AltaVista team at Digital Equipment Corp. Read more and listen.

At MetaScale, Counseling Enterprise IT on Adopting Hadoop, Big Data

Ankur Gupta, general manager of MetaScale, discusses the lessons—and empathy—that he has developed for IT shops in his four years of working with Hadoop and other technologies in a 1,000-node cluster that works in conjunction with an enterprise data warehouse, mainframes and other longstanding IT assets. Read more.

Traditional ETL Player Syncsort Retools for Hadoop, Big Data Analytics

Syncsort, a software company founded in 1968 to help mainframe systems administrators sort data for transactions processing, has found new life at the center of ETL processes that enterprises with legacy systems require to adopt Hadoop and big data analytics systems. Read more.

Innovative Relational Databases Create New Analytics Opportunities for SQL Programmers

With so many developers and data professionals familiar with SQL, and business intelligence tools and other applications written for SQL, the interactive query language isn’t going anywhere. Many new database creators are looking for ways to connect SQL to highly scalable distributed file systems such as Hadoop. Read more.

GridGain, ScaleOut Develop In-Memory Accelerators for Hadoop

As tech giants continue to develop their in-memory offerings, smaller players like ScaleOut and GridGain are also helping to bring greater attention to these innovative alternatives to traditional data processing technologies. All of which could bolster Hadoop as a mainstream option for the enterprise. Read more.

IBM Updates Big Data Analytics Offers with Skills and Trust in Mind

Among the highlights of IBM’s announcements at its 2013 Information on Demand conference were technologies designed to boost performance of its data platforms and the security of systems using the open source Hadoop Distributed File System. Read more.

SAP Expands Its Analytics Offerings, Builds Bridges to Hadoop

SAP is making a strong push into the big data marketplace with a flurry of new offerings, including an analytics software bundle that yokes its BusinessObjects business intelligence solutions with Sybase IQ, its high-speed analytics database. The company also announced it has built bridges between its database-bound systems and the popular Hadoop open-source distributed file system. Read more.

Microsoft’s Big Data Plans: Hadoop Links via SQL Server to SharePoint, Excel and Other Applications

Driven by demand from customers wanting to sift unstructured textual data from social networks, machine-to-machine data from server logs and other non-standard data sources, Microsoft has spent more than two years trying to add big-data storage and analytic capabilities to the company’s existing database products. Read more.

Related: Microsoft Commits to Hadoop, Supercharges SQL Server

Opinion: How Lambda Architecture Can Analyze Big Data Batches in Near Real-Time

The two requirements of big data projects–real-time information from a continuous flow of data and the analysis of immense data volumes supported by Hadoop–are generally at odds, writes Rajat Jain of Qubole. In this opinion article, he asserts that the Lambda Architecture represents a two-layered processing approach to address this problem. Read more.

Podcast: Teradata Deepens Ties to Hadoop, Unveils In-Database R Analytics

In September 2013, Teradata announced the latest version of its core offering, Teradata Database 14.10. It promises performance improvements in terms of speed, analytical power and smoother access to data stored in Hadoop. Teradata’s Chris Twogood discusses the issues involved. Read more and listen.

SAP, ScaleOut Software Bring In-Memory Connectors to Hadoop

Hadoop may be built for big data but it’s not known for supporting the slicing and dicing of real-time information. Now a number of companies are working to change that by connecting in-memory technologies to the popular open source distributed file system. Read more.

Intel Jumps into Big Data Pool with Hadoop Distribution

Intel, a company famous for its computer processing hardware, has jumped into the Hadoop software business, asserting that it wants to spur growth of big data analytics deployments in large data centers. Read more.

Hadoop Sandboxes Offer Experimental Spaces for Analytics Modelers

Pivotal Analytics Workbench is an example of a sandbox technology that allows users to test and tweak analytics algorithms. Read more.

Related: Hadoop Sandboxes Provide Low-Risk Entry for New Programmers

Data Firm Uses Development Tools To Bypass Shortage of Hadoop Skills

Faced with rising demand for applications to analyze this data for clients, Lotame, a company that serves publishers and marketers, found a set of development tools that helped the company get past a shortage of developers with Hadoop skills. Read more.

Hadoop Project Falcon: Data Lifecycle Management for App Developers

A data lifecycle management framework for Hadoop, Falcon simplifies data management by allowing users to easily configure and manage data migration, disaster recovery and data retention workflows. It is a project developed by Hortonworks and mobile ad network InMobi. Read more.

Pattern, Open Source Framework, Aims to Accelerate Analytics on Hadoop

Pattern is an open source scoring engine designed to ease and expedite Hadoop deployments for analytics projects. Read more.

Hadoop Security Tools Start to Step Up

Big data vendors such as DataStax are stepping up on Hadoop security, offering sophisticated built-in features. Read more.

Podcast: User Discussions About Hadoop Deepen as Technology Matures

Professionals attending the Strata Hadoop World conference in February 2013 noted that discussions about Hadoop had matured beyond the potential of the technology to focus on business use cases. Read more and listen.

Hadoop System Developers Carry on Quest for Real-Time Queries

The technology at the heart of Hadoop, a distributed file system that’s scanned in the batch computing process MapReduce, can’t deliver insights on data while it’s collected. It’s clear there is plenty of brain power pointed towards that problem. Read more.

Concurrent’s Lingual Designed to Let SQL Developers Run Big Data Applications on Hadoop

Lingual is an open source SQL engine that runs on top of Cascading, a framework for executing data processing on a Hadoop cluster. It is designed to let data scientists and developers with basic SQL skills build applications on Hadoop without any training in MapReduce. Read more.

Opinion: Why Big Data Projects Fail

When approaching big data, not nearly enough emphasis is being placed on value. For this reason, too many big data projects are being undertaken using technologies like Hadoop without the kind of results that are possible, writes Stephen Brobst of Teradata in this opinion article. Read more.

Developers Target Hadoop Performance Lags in Quest for Real-Time Analytics

A growing number of vendors are offering tools to address Hadoop’s batch-by-batch approach to data processing, including Cloudera’s Impala, the open source project Storm and Metamarkets’ Druid-based data engine. Read more.

WANdisco’s Distributed Data Replication Approach to Hadoop’s Single Point of Failure

WANdisco, a U.K.-based software provider, believes that its data replication technology can address a single point of failure in the Hadoop Distributed File System. Read more.

League of Legends screenshot

League of Legends battle arena.

League of Legends Powers Up Hadoop-Based BI Queries with Platfora

Riot Games, developer of the popular League of Legends online multiplayer video game, uses the Platfora BI suite to query data stored in Hadoop. Read more.

Sentiment Analysis Tool Designed to Predict Veterans’ Suicide Risk

Chris Poulin and his company, Patterns & Predictions, had developed a commercial Bayesian analytics tool for predicting events—most notably financial events—based on historical analysis. He’s been turning that capability to the issue of veterans’ at risk of suicide with an application that is supported by Hadoop. Read more.

Opinion: Why More Data and Simple Algorithms Beat Complex Analytics Models

More data and simple algorithms work because having more data allows the “data to speak for itself,” instead of relying on unproven assumptions and weak correlations, writes Garrett Wu of WibiData in this opinion piece. Hadoop is one of the technologies that makes this approach possible. Read more.

Hadoop Meetups a Prime Spot for Developers to Recruit, Trade Technical Tips with Peers

Far from the big industry convention halls, grassroots Hadoop user group meetings are now a place where developers can openly swap trade secrets with competitors, land a job without a resume and listen to top companies confess their open source failures. Read more.

Enterprises Value Data Pros With Multiple Skills, Dice Survey Shows

A May 2013 review at found Hadoop skills in hot demand. Read more.

Demand Strong for Hadoop, NoSQL Skills

In March 2013, it was clear that d\Demand for IT professionals with Hadoop and NoSQL skills were a small but fastest-growing segment for job-seekers. Read more.


Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>