The Role of Hadoop in Digital Transformations and Managing the IoT

by   |   August 25, 2016 5:30 am   |   1 Comments

Ken Tsai, VP, Head of Cloud Platform and Data Management, Product Marketing, SAP

Ken Tsai, VP, Head of Cloud Platform and Data Management, Product Marketing, SAP

The digital transformation underway at Under Armour is erasing any stale stereotypes that athletes and techies don’t mix. While hardcore runners sporting the company’s latest microthread singlet can’t see Hadoop, Apache Hive, Apache Spark, or Presto, these technologies are teaming up to track some serious mileage.

Under Armour is working on a “connected fitness” vision that connects body, apparel, activity level, and health. By combining the data from all these sources into an app, consumers will gain a better understanding of their health and fitness, and Under Armour will be able to identify and respond to customer needs more quickly with personalized services and products. The company stores and analyzes data about food and nutrition, recipes, workout activities, music, sleep patterns, purchase histories, and more.

Compiling, storing, and analyzing these types of structured and unstructured data at this scale would have been nearly impossible a decade ago. Today, companies can use Hadoop to merge their data from business applications, business analytics, web logs, the Internet of Things (IoT), and many other sources to deliver context-relevant insights. When companies collect data from all sources to augment the core of their business, they often realize real-time business insights that give them a competitive edge. 

Over the years, companies have invested significant amounts of time and money on untangling data schemas and making data consistent. The end goal was always to have more visibility and business insight – and to gain access to a greater portion of their own valuable business data as well as customer and partner data. Within enterprises, 60 to 73 percent of data is never used for business intelligence, analytics, or applications. An integrated, Hadoop-based data lake with integrated business systems has the potential to reduce those percentages significantly and give companies access to valuable big data signals. 

Hadoop Unfazed by Size or Schematic

Related Stories

Two Methodologies Drive Hadoop Enterprise Adoption.
Read the story »

Spark and Hadoop: In the Cloud or On-Premises?
Read the story »

You Don’t Have to Choose Between ‘Big Data’ and ‘IoT.’
Read the story »

You Have a Business Problem, Not an IoT Problem.
Read the story »

Hadoop doesn’t need to enforce schema to store data, and it can store and process very large sets of structured and unstructured data. For enterprises, the unstructured data is especially intriguing, as images, video, audio, and social media are taking over the digital universe and greatly outpacing the growth of structured data. When business systems are integrated with Hadoop data lakes and business data, they have a 360-degree view of what is happening in the business.

Companies using Hadoop are storing data in thousands of nodes, and they can process that data more efficiently by implementing various SQL-based or MapReduce distributed compute frameworks. The open-source Apache Software Foundation has also opened the door to integrating multiple emerging data-processing frameworks so that all types of data can be analyzed and mined for business insights.

At Under Armour, an analytics data warehouse, SQL-based big data processing engine, and machine-learning engine work together to provide business and user insights, personalized recommendations, search enhancements, and data access, but the data innovations won’t end there.

Merging Contextual and Business Data

Several Apache projects with Hadoop have defined a flexible framework that can be integrated with machine-learning tools or deep-learning libraries, which enhances digital image detection and recognition. A huge library of product images, for example, can be processed quickly or individuals in crowds can be identified automatically.

Retailers can boost sales by relating pictures of their products with a customer’s past shopping preferences. In addition, they can merge sentiment analytics and track all the steps in a customer journey, including competitive offerings and prospect behavior. By tracking social media, website activity, and call-center data after a product or service launch, companies easily understand which products are successful based on consumer posts and customer feedback. Understanding of why people make purchases – and why they don’t – has always drained out of the customer journey like water through a sieve. The ability to iteratively discover relevant big data signals enables organizations to track contextual information around consumer behavior.

eBook: Tap into the Power of Machine Learning


Hadoop also is helping manufacturers improve real-time quality assurance on the production line. Manufacturers can photograph images of goods as they are being assembled and, using image recognition, automatically inspect the product to see if it meets the factory standards. In many cases, the automated image recognition is more accurate than human review.

Companies also are relying on a new breed of distributed computing technologies to merge business data from ERP, HR, finance, sales, and inventory with operational information like equipment and installation maintenance. Turkish Airlines, for example, has started a program to track its flight operations with equipment and maintenance, procurement, and parts purchasing, plus lining up crew to make repairs. Automating these processes ensures that repairs happen before customers experience a service interruption.

Handling the IoT with Hadoop

Maintenance and quality assurance are among the many areas where organizations will want to apply the IoT, and having Hadoop is a real advantage. CenterPoint Energy in Houston relies on Hadoop to reduce the storage costs of the data it collects from more than 2.3 million customers. Every 15 minutes, CenterPoint collects energy-usage reporting from smart meters, which means the company is processing more than 5 billion records. As the IoT becomes more prevalent, storage and computing requirements will only increase.

Looking ahead, the IoT will produce vast amounts of data as billions of sensors transmit information multiple times a day. In many IoT scenarios, much of the data will be the same information. Let’s say a sensor is in a washing machine and tracks vibrations, temperature, and time of use. For months, the data transmitted will never change, but when it moves beyond acceptable thresholds, big data solutions must efficiently store and process the time-series data and deliver these abnormal signals in real time so that a chain of appropriate responses that will prevent equipment failure and service interruptions can be initiated.

Processing, computing, and storing all the collected IoT data would have been too costly in the past, especially in context of business operations. Business solutions with distributed computing platforms on a Hadoop distributed file system can make processing, analyzing, and managing large amounts of data in distributed environments economically feasible. Businesses can store and process more data and much larger data sets. They also can avoid expensive capital and operational investments by storing and managing their data in the cloud.

Business Data for Everyone

Big data technologies like Hadoop will deliver the next big wave of business value by using merged data sources to determine why an event happened, as well as the context and impact of the event. To reach this next step, Hadoop’s open-source framework needs to expand to support the distributed computing capabilities across enterprise apps, data marts, warehouses, and Hadoop data lakes. Once this can be done cost effectively and with high performance and strong traceability, the value of big data in Hadoop will become available to a much wider group of decision makers.

Sales and marketing, for example, will be able to see in real time which products and services are performance leaders and which ones are lagging, while discovering the causes for the performance from big data signals. Price points, inventory, and production will be adjusted based on a 360-degree view of the business and big data. HR, supply chain, operation managers, manufacturing supervisors, and others responsible for business performance will be able to make data-driven decisions that put the company ahead of its competition. Instead of waiting days or weeks to run business reports, business analysts and data scientists will query the data directly and gain the insight they need to update and enhance a product launch, make changes to an assembly line, and repair a part before it breaks.

Big data technologies like Hadoop have been an on ramp to the fast-moving digital economy. The next wave of big data innovation will focus on making these technologies more accessible to enterprise applications and analytics needs, and making big data easily available and democratized for everyone who needs it.

Ken Tsai is the VP and Head of Data Management and PaaS, and leads product marketing efforts of SAP’s in-memory computing platform SAP HANA, HANA Cloud Platform, and the portfolio of SAP data management solutions such as HANA Vora, ASE, IQ, SQL Anywhere, and Event Stream Processing. Ken has 20+ years of experience in the IT industry, spanning across development, implementation, pre-sales, business development, and product marketing. Ken is a graduate of the University of California, Berkeley.

Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.

eBook: crack the unstructured data code with deep learning

Tags: , , , , , , , , , , , , , , ,

One Comment

  1. Posted September 12, 2016 at 10:52 am | Permalink

    CenterPoint Energy’s example is classical one, Big data analytics and IoT are the real game changers. Great article Ken Tsai.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>