How the Internet of Things Changes Big Data Analytics

by   |   August 9, 2016 5:30 am   |   3 Comments

Dan Graham, General Manager, Enterprise Systems, Teradata

Dan Graham, General Manager, Enterprise Systems, Teradata

Yes, there’s plenty of hype surrounding the Internet of Things (IoT). But this is one time the hype underestimates what comes next. IDC says there will be 28 billion sensors in use by 2020, with $1.7 trillion in economic value. The scale, breadth, and business value will exceed anything seen in the past. Ray Kurzweil’s Singularity is here. IoT will be an order of magnitude bigger than big data in scale and value.

Imagine a few billion sensors sending messages 20 times a second or even once a minute. The scale of the data is astonishing. Even Facebook addicts can’t talk that much. For many, IoT data volume will be in the petabyte range.

Fortunately, the cost of disk storage continues its free fall. Every person on the planet will be touched by sensors in 2025, or sooner. Even in the Outback, the Sahara, and especially grandma. Get ready for digital rivers of data driving new growth use cases in every industry.

Sensors and things operate at the edge (Figure 1). “Things” are any item we can attach a sensor to – including you. The edge is where we find Operational Technology (OT). It includes manufacturing plants, cars, electrical grids, and train tracks. The OT engineers and operators have been using sensor data for decades. But now Information Technology (IT) is now pitching in to help out. Gateways are routers and servers that connect the OT to IT systems.

Figure 1. Internet of Things – the basics. Click to enlarge.

Figure 1. Internet of Things – the basics. Click to enlarge.


A majority of the ROI comes from analyzing sensor data. Note that analytics are spread throughout IoT systems like chocolate baked into a cake. IoT analytics are collectively called the Analytics of Things (AoT).

Where the Wild Things Are

Data now comes from devices with attached sensors. Some things are stationary (wind turbines), others are mobile (cars). While 70 percent of sensors are inside the intranet, 30 percent are “in the wild.” In many implementations, sensor data will be massively dispersed around the planet.  That’s vastly different from getting ERP or CRM data extracts.

Related Stories

Find Business Value in the Maturing Internet of Things.
Read the story »

The Impact of the Internet of Things in 2026.
Read the story »

Moving Beyond the Hype, the Internet of Things Gets to Work.
Read the story »

Unlock the Value of the Internet of Things with Data Storytelling.
Read the story »

Consider Monsanto’s precision agriculture trying to collect data from farmers around the world. Monsanto must build on regional clouds. But even the cloud doesn’t touch most farms. Then imagine negotiating for data with farmers in Mexico, Nigeria, China, and Brazil.

Mobile sensor data arrives from thousands of airplanes, cars, patients, tools, and inventory pallets. These sensors disappear when in a tunnel or 10 kilometers up in the sky. Hence, sensors can be disconnected from the network. This means that data is sometimes lost, and also that developers must plan for “data catch-up mode” when the device is back online.

Data gathering is different than in the past.

Get over it, but have a plan.

Ch-ch-ch-ch-Changes; Turn and Face the ETL

Data integration changes enormously with IoT. Considering digital rivers of data from the edge, how can we manage the size of these real-time streams?

First, let’s be clear: never, ever lose data. My CTO tattooed that on my brain. But digital rivers can burn out network budgets. There are a few solutions:

    • Filter and send the data only when needed.


    • Send data only when it crosses a threshold.


    • Compress the data using lossless algorithms.


One solution is lossless compression done at the edge to keep costs down. Most people learned about lossy algorithms from music downloads. MP3 files sound tinny and flat because resolution is thrown away to save disk space. Do you want high-resolution data driving analytics with poor resolution? Imagine discarding all helicopter sensor data below safety thresholds. But the raw data shows 10 critical sensors running a smidgen below safety thresholds at the same instant. A data scientist can only predict catastrophic failures if she has all the data. Data that’s discarded inevitably contains the outliers that business success depends on.

The Land Before Time

A majority of sensor data is time-series data. It arrives as a sensor ID, date-time stamp, and measurement. Typically it’s a continuous stream of data per sensor. Often, the granularity of sensor data is more than is needed. Imagine 1,000 sensor measures per minute when we need only 20. This leads to a sliding window of intervals applied to the data with only 20 measurements output. But these interval results are much more complex than averages. Sliding interval windows leads to curve-fitting techniques (Figure 2). Are you still with me? We just crossed over into data science algorithms like Fast Fourier Transforms and SAX. Sensor data integration requires advanced analytics algorithms to simplify the data. Now, mathematicians and algorithm suppliers are needed for data preparation. Add deep math skills to the data integration team.

Figure 2. Sliding interval windows leads to curve-fitting techniques. Click to enlarge.

Figure 2. Sliding interval windows leads to curve-fitting techniques. Click to enlarge.


A River Runs Through It

Digital rivers are best stored on low cost, scale-out systems called data lakes. It’s a good place to keep all the raw data. Raw data normally doesn’t get put in the data warehouse because raw data has unproven value. Data lakes, of course, need security, policies for retention, and governance.

Engineers derive insights exploring the loosely coupled raw data. Engineers discover how their design really works in the wild. They can track low-performing parts by version back to the supplier. They can correlate multiple sensor streams to see how parts cause other parts to fail. And they can predict future failure dates by machine or device ID.

If a few dozen streams of sensor data can do all that, what happens when we add sensor data to supply chain data, customer data, inventory, and pricing data?

eBook: The Internet of Things and Data Insights for your Organization


One airplane manufacturer did exactly that. The engineers (OT) explored sensor data from airplane engines. They applied predictive analytics to detect when certain parts would start failing. The IT team saw an opportunity to combine the OT and IT side of the business. They found a tool to query sensor data from the data lake via the data warehouse. The business people then made sure the parts needed were available to match the maintenance schedules. Then they matched failure predictions to labor schedules. The business people went on to discover 10 more ROI use cases from putting sensor data in context. Exponential value came from putting data lake information into the data warehouse.

“This new product data is valuable by itself,” Michael Porter and James Heppelmann wrote in the Harvard Business Review, “yet its value increases exponentially when it is integrated with other data, such as service histories, inventory locations, commodity prices, and traffic patterns.”

Plan on having a data lake for low-cost storage. Deliver refined data to the data warehouse.

Oh, the Places You’ll Go!

Sensor data is just like any other data. We clean it, secure it, govern it, and analyze it. But sensor data also is not like any other data. Therefore, it’s imperative to do the following:

    • Design an architecture for massively dispersed or disconnected sensors.


    • Never use lossy algorithms to compress sensor data.


    • Add strong math skills to the data-integration team.


    • Keep raw sensor data in a data lake for cost savings and archival purposes.


    • Exploit sensor data in the data warehouse for exponential value creation.


As the General Manager of Enterprise Systems for Teradata Corporation, Dan Graham is responsible for strategy, go-to-market success, and competitive differentiation for the Active Data Warehouse platform and Extreme Performance Appliance. He has nearly 40 years’ experience in the industry and joined Teradata in 1989.

Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.

Anzo Smart Data Lake [Whitepaper]

Tags: , , , , , , , , ,


  1. Posted August 12, 2016 at 7:23 am | Permalink

    Excellent article, the airplane manufacturer’s example is really valid in this case, clearly demonstrates how sensors and analytics can help enterprises improve their ROI.

  2. Anjolaiya Oladipo
    Posted August 18, 2016 at 2:41 am | Permalink

    Absolutely lovely article retaining some very intriguing facts about the adoption and the whole new difference the world is experiencing exponentially by the emergence of data integration and analytics as far as the interactibility and syncronization of the Operational Technology(OT) and Information Technology is concerned. Value extraction from sensor data and broad data analytics has certainly come into play with lifelong business improvement policies and an adequate maintenance drive in making the World a better abode!

  3. Posted August 30, 2016 at 1:39 pm | Permalink

    It strikes me the full scope of what’s eloquently discussed in this article is not yet known, but what is presented is fascinating.

    How will this technology improve the lives of people around the planet? There are perhaps thousands of use cases to address that one question.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>