Yes, there’s plenty of hype surrounding the Internet of Things (IoT). But this is one time the hype underestimates what comes next. IDC says there will be 28 billion sensors in use by 2020, with $1.7 trillion in economic value. The scale, breadth, and business value will exceed anything seen in the past. Ray Kurzweil’s Singularity is here. IoT will be an order of magnitude bigger than big data in scale and value.
Imagine a few billion sensors sending messages 20 times a second or even once a minute. The scale of the data is astonishing. Even Facebook addicts can’t talk that much. For many, IoT data volume will be in the petabyte range.
Fortunately, the cost of disk storage continues its free fall. Every person on the planet will be touched by sensors in 2025, or sooner. Even in the Outback, the Sahara, and especially grandma. Get ready for digital rivers of data driving new growth use cases in every industry.
Sensors and things operate at the edge (Figure 1). “Things” are any item we can attach a sensor to – including you. The edge is where we find Operational Technology (OT). It includes manufacturing plants, cars, electrical grids, and train tracks. The OT engineers and operators have been using sensor data for decades. But now Information Technology (IT) is now pitching in to help out. Gateways are routers and servers that connect the OT to IT systems.
A majority of the ROI comes from analyzing sensor data. Note that analytics are spread throughout IoT systems like chocolate baked into a cake. IoT analytics are collectively called the Analytics of Things (AoT).
Where the Wild Things Are
Data now comes from devices with attached sensors. Some things are stationary (wind turbines), others are mobile (cars). While 70 percent of sensors are inside the intranet, 30 percent are “in the wild.” In many implementations, sensor data will be massively dispersed around the planet. That’s vastly different from getting ERP or CRM data extracts.
Consider Monsanto’s precision agriculture trying to collect data from farmers around the world. Monsanto must build on regional clouds. But even the cloud doesn’t touch most farms. Then imagine negotiating for data with farmers in Mexico, Nigeria, China, and Brazil.
Mobile sensor data arrives from thousands of airplanes, cars, patients, tools, and inventory pallets. These sensors disappear when in a tunnel or 10 kilometers up in the sky. Hence, sensors can be disconnected from the network. This means that data is sometimes lost, and also that developers must plan for “data catch-up mode” when the device is back online.
Data gathering is different than in the past.
Get over it, but have a plan.
Ch-ch-ch-ch-Changes; Turn and Face the ETL
Data integration changes enormously with IoT. Considering digital rivers of data from the edge, how can we manage the size of these real-time streams?
First, let’s be clear: never, ever lose data. My CTO tattooed that on my brain. But digital rivers can burn out network budgets. There are a few solutions:
- Filter and send the data only when needed.
- Send data only when it crosses a threshold.
- Compress the data using lossless algorithms.
One solution is lossless compression done at the edge to keep costs down. Most people learned about lossy algorithms from music downloads. MP3 files sound tinny and flat because resolution is thrown away to save disk space. Do you want high-resolution data driving analytics with poor resolution? Imagine discarding all helicopter sensor data below safety thresholds. But the raw data shows 10 critical sensors running a smidgen below safety thresholds at the same instant. A data scientist can only predict catastrophic failures if she has all the data. Data that’s discarded inevitably contains the outliers that business success depends on.
The Land Before Time
A majority of sensor data is time-series data. It arrives as a sensor ID, date-time stamp, and measurement. Typically it’s a continuous stream of data per sensor. Often, the granularity of sensor data is more than is needed. Imagine 1,000 sensor measures per minute when we need only 20. This leads to a sliding window of intervals applied to the data with only 20 measurements output. But these interval results are much more complex than averages. Sliding interval windows leads to curve-fitting techniques (Figure 2). Are you still with me? We just crossed over into data science algorithms like Fast Fourier Transforms and SAX. Sensor data integration requires advanced analytics algorithms to simplify the data. Now, mathematicians and algorithm suppliers are needed for data preparation. Add deep math skills to the data integration team.
A River Runs Through It
Digital rivers are best stored on low cost, scale-out systems called data lakes. It’s a good place to keep all the raw data. Raw data normally doesn’t get put in the data warehouse because raw data has unproven value. Data lakes, of course, need security, policies for retention, and governance.
Engineers derive insights exploring the loosely coupled raw data. Engineers discover how their design really works in the wild. They can track low-performing parts by version back to the supplier. They can correlate multiple sensor streams to see how parts cause other parts to fail. And they can predict future failure dates by machine or device ID.
If a few dozen streams of sensor data can do all that, what happens when we add sensor data to supply chain data, customer data, inventory, and pricing data?
One airplane manufacturer did exactly that. The engineers (OT) explored sensor data from airplane engines. They applied predictive analytics to detect when certain parts would start failing. The IT team saw an opportunity to combine the OT and IT side of the business. They found a tool to query sensor data from the data lake via the data warehouse. The business people then made sure the parts needed were available to match the maintenance schedules. Then they matched failure predictions to labor schedules. The business people went on to discover 10 more ROI use cases from putting sensor data in context. Exponential value came from putting data lake information into the data warehouse.
“This new product data is valuable by itself,” Michael Porter and James Heppelmann wrote in the Harvard Business Review, “yet its value increases exponentially when it is integrated with other data, such as service histories, inventory locations, commodity prices, and traffic patterns.”
Plan on having a data lake for low-cost storage. Deliver refined data to the data warehouse.
Oh, the Places You’ll Go!
Sensor data is just like any other data. We clean it, secure it, govern it, and analyze it. But sensor data also is not like any other data. Therefore, it’s imperative to do the following:
- Design an architecture for massively dispersed or disconnected sensors.
- Never use lossy algorithms to compress sensor data.
- Add strong math skills to the data-integration team.
- Keep raw sensor data in a data lake for cost savings and archival purposes.
- Exploit sensor data in the data warehouse for exponential value creation.
As the General Manager of Enterprise Systems for Teradata Corporation, Dan Graham is responsible for strategy, go-to-market success, and competitive differentiation for the Active Data Warehouse platform and Extreme Performance Appliance. He has nearly 40 years’ experience in the industry and joined Teradata in 1989.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.