A jet airliner generates 20 terabytes of diagnostic data per hour of flight. The average oil platform has 40,000 sensors, generating data 24/7. In accordance with European Union guidelines, 80 percent of all households in Germany (32 million) will need to be equipped with smart meters by 2020.
Machine-to-machine (M2M) sensors, monitors and meters like these will fuel the Internet of Things. M2M is now generating enormous volumes of data and is testing the capabilities of traditional database technologies. In many industries, the data load predictions of just 12 to 24 months ago have long been surpassed. This is creating tremendous strain on infrastructures that did not contemplate the dramatic increase in the amount of data coming in, the way the data would need to be queried, or the changing ways business users would want to analyze data.
To extract rich, real-time insight from the vast amounts of machine-generated data, companies will have to build a technology foundation with speed and scale because raw data, whatever the source, is only useful after it has been transformed into knowledge through analysis. For example, a mobile carrier may want to automate location-based smartphone offers based on incoming GPS data, or a utility may need smart meter feeds that show spikes in energy usage to trigger demand response pricing. If it takes too long to process and analyze this kind of data, or if applications are confined to predefined queries and canned reports, the resulting intelligence will fail to be useful, resulting in potential revenue loss.
Investigative analytics tools enable interactive, ad-hoc querying on complex big data sets to identify patterns and insights and can perform analysis at massive scale with precision even as machine-generated data grows beyond the petabyte scale. With investigative analytics, companies can take action in response to events in real-time and identify patterns to either capitalize on or prevent an event in the future. This is especially important because most failures result from a confluence of multiple factors, not just a single red flag.
However, in order to run investigative analytics effectively, the underlying infrastructure must be up to the task. We are already seeing traditional, hardware-based infrastructures run out of storage and processing headroom. Adding more data centers, servers and disk storage subsystems is expensive. Column-based technologies are generally associated with data warehousing and provide excellent query performance over large volumes of data. Columnar stores are not designed to be transactional, but they provide much better performance for analytic applications than row-based databases designed to support transactional systems.
Hadoop has captured people’s imaginations as a cost-effective and highly scalable way to store and manage big data. Data typically stored with Hadoop is complex, from multiple data sources, and includes structured and unstructured data. However, companies are realizing that they may not be harnessing the full value of their data with Hadoop due to a lack of high-performance ad-hoc query capabilities.
To fully address the influx of M2M data generated by the increasingly connected Internet of Things landscape, companies can deploy a range of technologies to leverage distributed processing frameworks like Hadoop and NoSQL and improve performance of their analytics, including enterprise data warehouses, analytic databases, data visualization, and business intelligence tools. These can be deployed in any combination of on-premise software, appliance, or in the cloud. The reality is that there is no single silver bullet to address the entire analytics infrastructure stack. Your business requirements will determine where each of these elements plays its role. The key is to think about how business requirements are changing. Move the conversation from questions like, “How did my network perform?” to time-critical, high-value-add questions such as, “How can I improve my network’s performance?”
To find the right analytics database technology to capture, connect, and drive meaning from data, companies should consider the following requirements:
Real-time analysis. Businesses can’t afford for data to get stale. Data solutions need to load quickly and easily, and must dynamically query, analyze, and communicate M2M information in real-time, without huge investments in IT administration, support, and tuning.
Flexible querying and ad-hoc reporting. When intelligence needs to change quickly, analytic tools can’t be constrained by data schemas that limit the number and type of queries that can be performed. This type of deeper analysis also cannot be constrained by tinkering or time-consuming manual configuration (such as indexing and managing data partitions) to create and change analytic queries.
Efficient compression. Efficient data compression is key to enabling M2M data management within a network node, smart device, or massive data center cluster. Better compression allows for less storage capacity overall, as well as tighter data sampling and longer historical data sets, increasing the accuracy of query results.
Ease of use and cost. Data analysis must be affordable, easy-to-use, and simple to implement in order to justify the investment. This demands low-touch solutions that are optimized to deliver fast analysis of large volumes of data, with minimal hardware, administrative effort, and customization needed to set up or change query and reporting parameters.
Companies that continue with the status quo will find themselves spending increasingly more money on servers, storage, and DBAs, an approach that is difficult to sustain and is at risk of serious degradation in performance. By maximizing insight into the data, companies can make better decisions at the speed of business, thereby reducing costs, identifying new revenue streams, and gaining a competitive edge.
Don DeLoach is CEO and president of Infobright. Don has more than 25 years of software industry experience, with demonstrated success building software companies with extensive sales, marketing, and international experience. Don joined Infobright after serving as CEO of Aleri, the complex event processing company, which was acquired by Sybase in February 2010. Prior to Aleri, Don served as President and CEO of YOUcentric, a CRM software company, where he led the growth of the company’s revenue from $2.8M to $25M in three years, before being acquired by JD Edwards. Don also spent five years in senior sales management culminating in the role of Vice President of North American Geographic Sales, Telesales, Channels, and Field Marketing. He has also served as a Director at Broadbeam Corporation and Apropos Inc.