The Impact of the Internet of Things on Big Data

by   |   September 10, 2015 5:30 am   |   1 Comments

The Internet of Things (IoT) is on its way to becoming the next technological revolution. According to Gartner, revenue generated from IoT products and services will exceed $300 billion in 2020, and that probably is just the tip of the iceberg. Given the massive amount of revenue and data that the IoT will generate, its impact will be felt across the entire big data universe, forcing companies to upgrade current tools and processes, and technology to evolve to accommodate this additional data volume and take advantage of the insights all this new data undoubtedly will deliver.

Let’s take a closer look at the various ways in which the IoT will impact big data.

Data Storage

 When we talk about IoT, one of the first things that comes to mind is a huge, continuous stream of data hitting companies’ data storage. Data centers must be equipped to handle this additional load of heterogeneous data.

In response to this direct impact on big data storage infrastructure, many organizations are moving toward the Platform as a Service (PaaS) model instead of keeping their own storage infrastructure, which would require continuous expansion to handle the load of big data. PaaS is a cloud-based, managed solution that provides scalability, flexibility, compliance, and a sophisticated architecture to store valuable IoT data.

Cloud storage options include private, public, and hybrid models. If companies have sensitive data or data that is subject to regulatory compliance requirements that require heightened security, a private cloud model might be the best fit. Otherwise, a public or hybrid model can be chosen as storage for IoT data.

Big Data Technologies

When selecting the technology stack for big data processing, the tremendous influx of data that the IoT will deliver must be kept in mind. Organizations will have to adapt technologies to map with IoT data. Network, disk, and compute power all will be impacted and should be planned to take care of this new type of data.

Related Stories

Managing Athlete Health in the Age of the Internet of Things.
Read the story »

BPM of Things: the Next Generation of the Internet of Things.
Read the story »

How Telecoms Can Adapt to the Internet of Things.
Read the story »

The Evolution of Advertising in an Internet of Things World.
Read the story »

From a technology perspective, the most important thing is to receive events from IoT-connected devices. The devices can be connected to the network using Wi-Fi, Bluetooth, or another technology, but must be able to send messages to a broker using some well-defined protocol. One of the most popular and widely used protocols is Message Queue Telemetry Transport (MQTT). Mosquitto is a popular open-source MQTT broker.

Once the data is received, the next consideration is the technology platform to store the IoT data. Many companies use Hadoop and Hive to store big data. But for IoT data, NoSQL document databases like Apache CouchDB are more suitable because they offer high throughput and very low latency. These types of databases are schema-less, which supports the flexibility to add new event types easily. Other popular IoT tools are Apache Kafka for intermediate message brokering and Apache Storm for real-time stream processing.

Data Security

 The types of devices that make up the IoT and the data they generate will vary in nature – raw devices, varied types of data, and communication protocol – and this carries inherent data security risks. This heterogeneous IoT world is new to security professionals, and that lack of experience increases security risks. Any attack could threaten more than just the data – it also could damage the connected devices themselves.
IoT data will require organizations to make some fundamental changes to their security landscape. As the IoT evolves, an unmanaged number of IoT devices will be connected to the network. These devices will be of different shapes and sizes and located outside the network, capable of communicating with corporate applications. Therefore, each device should have a non-repudiable identification for authentication purposes. Enterprises should be able to get all the details about these connected devices and store them for audit purposes. All internal and external core routers/switches should be instrumented with X.509 certificates for creating trusted connectivity between public and private networks.

eBook: Crack the Unstructured Data Code with Deep Learning


A multi-layered security system and proper network segmentation will help prevent attacks and keep them from spreading to other parts of the network. A properly configured IoT system should follow fine-grained network access control policies to check which IoT devices are allowed to connect. Software-defined networking (SDN) technologies, in combination with network identity and access policies, should be used to create dynamic network segmentation. SDN-based network segmentation also should be used for point-to-point and point-to-multipoint encryption based on some SDN/PKI amalgamation.
Big Data Analytics

IoT and big data basically are two sides of the same coin. Managing and extracting value from IoT data is the biggest challenge that companies face. Organizations should set up a proper analytics platform/infrastructure to analyze the IoT data. And they should remember that not all IoT data is important.

A proper analytics platform should be based on three parameters: performance, right-size infrastructure, and future growth. For performance, a bare-metal server, a single-tenant physical server dedicated to a single customer, is the best fit. For infrastructure and future growth, hybrid is the best approach. Hybrid deployments, which consist of cloud, managed hosting, colocation, and dedicated hosting, combine the best features from multiple platforms into a single optimal environment. Managed Service Providers (MSPs) are also working on their platforms to handle IoT data. MSP vendors are typically working on the infrastructure, performance, and tools side to cover the entire IoT domain.

An IoT device generates continuous streams of data in a scalable way, and companies must handle the high volume of stream data and perform actions on that data. The actions can be event correlation, metric calculation, statistics preparation, and analytics. In a normal big data scenario, the data is not always stream data, and the actions are different. Building an analytics solution to manage the scale of IoT data should be done with these differences in mind.

The growth of the IoT heralds a new age of technology, and organizations that wish to participate in this new era will have to change the way they do things to accommodate new data types and data sources. And these changes likely are just the beginning. As the IoT grows and businesses grow with IoT, they will have many more challenges to solve.

Kaushik Pal has more than 16 years of experience as a technical architect and software consultant in enterprise application and product development. He is interested in new technology and innovation areas, as well as technical writing. His main focus area is web architecture, web technologies, Java/J2EE, Open source, big data, cloud, and mobile technologies. You can find more of his work at Email him at

Subscribe to Data Informed
for the latest information and news on big data and analytics for the enterprise.

Tags: , , , , , , , , ,

One Comment

  1. Eric Kawi Kaumbuthu
    Posted April 3, 2017 at 6:39 pm | Permalink

    amazing article..thanks for sharing this

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>