The shrewd user of the Internet of Things will be composed in their response to the data levels produced by an interconnected world. It is not precision that is the best, first mechanism to manage all the information that is rising around us. Rather, approximation can gather what seems like chaotic data into an ideal starting point for analysis.
InfoBright President and CEO Don DeLoach sat down with Data Informed to talk about the swelling data realities of the IoT. DeLoach could speak for days about interconnected technologies, and he explained, among other things, how approximation can make the data trawl more efficient.
Data Informed: For companies considering an IoT strategy, the volume of data can be intimidating. What should companies focus on when formulating that strategy?
Don DeLoach: I do think there are certain basic elements that can known and applied right now as people begin to think about forming these systems. And those really come down to contemplation of what you are trying to accomplish in the first place. For example, what type of queries am I going to be running? What is my goal for whatever I am establishing as the system in place, and what do I contemplate the evolution of that system to look like? To the extent that I can keep from hard-wiring too much, that’s probably good. But then the other considerations come down to resources ranging from compute power to network capacity, availability, and cost; and the cost of the people and the administrative resources required to run the infrastructure to facilitate the type of analytics and the type of results that you are after.
In this regard, I think one thing that we can do is learn a great deal from what we have seen from mobile network operators struggling with operational support systems. If you looked at a network monitoring or a network optimization, or a network troubleshooting type of system that was in place five years ago and the underlying data architecture and the type of capabilities they brought to bear, what you would find is that most of them were designed in the early to mid ’90s, and they mostly used a traditional relational database that was used for both transactional and reporting. And as the proliferation of mobile devices began to create a heavier and heavier load on the network, the OSS (Operational Support System) providers, the solution providers, kept having to try to find ways to keep their performance adequate for their users. These large service providers, in order to keep these solutions running, would have to deploy more and more hardware. They would keep having to index the database. They would have to apply more and more database administrators to keep it running. And all the while, asymptotically, you are approaching the wall when it comes to performance. And what you found was that the Verizons, the AT&Ts of the world, started to have a very limited sense of humor about how much they were willing to put up with.
And slowly but surely, most of the OSS solutions have adopted disruptive technologies to create a different set of characteristics in how these solutions are deployed, where they can support much more data with less hardware and far fewer people to achieve their objectives. And so that type of disruption was an accommodation of the characteristics of the market and the trends in the market that were otherwise creating these impediments. And so when you look at the Internet of Things and the type of solutions that we can contemplate being a part of the Internet of Things and the loads on the networks, many of the same lessons that have now been learned by the OSS providers are illustrative of the type of accommodations that will need to be made for these Internet of Things solutions.
Are there considerations that are specific to smaller companies or particular verticals?
DeLoach: There definitely are. Let’s take the example of the telecoms and I will extend it. You have an environment in which you are offering solutions and the load on the solution itself, the amount of data being contemplated for the solution, reaches a point where you have to make these accommodations. It may be possible to string together 500 mainframe-class compute facilities, but it’s just not practical to do that. No one wants to have to have a Cray (supercomputer) in order to do basic processing. But the fact of the matter is there are a number of industries where the growth rates and the data are creating this type of demand. Aside from the telecommunications space, another example would be energy distribution and smart grids. When people think of smart grids, they think of smart power meters, which are collecting data. And if you have millions of customers with smart power meters, you are collecting a lot of data, but it’s all relative. Think about things like the PMUs (Phasor Measurement Units) that are collecting exponentially larger amounts of data because they are doing voltage monitoring of the distribution lines and may be taking 30 readings a second, whereas your smart meter might be taking one reading every 15 minutes. You get an idea of the type of scale that’s being put in play and the amount of data that needs to be harvested and analyzed.
And that brings you to another consideration: The network itself becomes interesting in that the architectural considerations around where you actually process the data has to come into play. The notion that one central processing unit is going to make sense when you have a million times the data out there, that becomes impractical, this whole notion of pump everything into one singular cloud. It may be, as Cisco talks about, an intercloud strategy or a cloud of clouds. So I might still want to do my computing near where I ingest the data and, in essence, if not completely, do sort of a pre-compute on certain data before I provide that to a more holistic environment. And again, there’s all kinds of architectural contemplation around how this will be done and how it should be done, and I think that level of introspection as these new solutions roll out is completely appropriate.
How can a concept like approximate query help companies get answers from the massive amount of data the Internet of Things will deliver?
DeLoach: The aim is to accomplish as much as possible with very limited time and resources. The utilization of something like an approximate query is really going to depend on the use case. For example, if I am looking for something like a top-ten query: What are the top 10 devices on my network that saw failures in the last 24 hours? Or who were the top 10 revenue contributors to my advertising website? Those tend to be very, very accurate. The accuracy tends to diminish a little bit when there’s an incredibly even distribution, but in most cases, that’s not the case. Using something like an approximate query, with the way we prepare the data in the first place, tends to be very effective. It depends on the type of queries you are running. For example, if I am trying to do a distinct count, where I am looking for something specific, almost by definition that doesn’t really lend itself to doing approximate queries. However, if you step back and say, When is this the most relevant and the most helpful? Generally speaking, something like approximate queries would be the most helpful when I am doing investigative analytics. And this is really the realm of the data scientist and there’s a whole emerging class of computing that’s emerging around data science and investigative analytics. This technology happens to be very well suited for that. Think of almost a forensic search through data, where I am going to ask a question of the data and, depending on the answer I get back, I am going to ask another question. And I may chain together 19 of these before I arrive at my answer. Generally speaking, what that suggests is that you don’t know much about what you are looking for until you get the result set of the prior query back, and that informs the next query. But if I am asking very complex queries, and I am asking them against 100 TB of data, each query may take an hour. And if I am chaining 19 of those together, that’s hardly a quick or cost-effective process. But if I could cut down by 95 percent the time and the resources I am using to get to each answer, but the answer is good enough to get me from question to question, then I can take that 19-hour process and cut it down to maybe 30 minutes and get to the same answer. And so it’s all about time and money. And, fundamentally, what it’s really about is taking what is taking what is technologically possible and moving it into the realm of what is practical.
So an organization is going to need to learn how to form these questions.
DeLoach: Sure, and that requirement is only going to increase as the scale of the data they are using increases and as the nature of the data changes. With the advent of the Internet of Things, I might be combining social data with traditional data with machine data at a scale that I had never contemplated before. So my ability to maneuver through that data is going to be an entirely different proposition than what I might have done five years ago when trying to figure out, for example, whether my inventory levels are acceptable at my hardware distribution facility.
What do you see as the biggest challenge slowing the development of the Internet of Things?
DeLoach: If I had to pare it down to one or two things, the obvious, biggest challenge is security. There is example after example of people hacking into connected cars, hijacking a smart TV and taking over the camera to where you basically have someone monitoring you in your living room, launching a denial-of-service attack from the IP address of associated with your refrigerator. These are very real examples that are going to have to be addressed. There’s certainly a body of evidence that would say that, at the device level, the operating system that supports the device is going to have to be incredibly secure at the OS level, and there’s all kinds of discussion around how that’s going to take place. So, to me, that would be the most obvious issue.
The more subtle nuance to this is more around governance and how the governance of the Internet of Things will play out. And that’s not just elements of privacy and ownership, but it’s things like the naming conventions. With the Internet, you have a DNS structure for how you navigate through the Internet and the whole ONS (Object Naming Service) derivative of a DNS has yet to completely roll out. And there are all kinds of considerations around how will that be administered, what will it look like, what will the implications in terms of how you deploy devices be, how failsafe will they be as a function of that structure. There’s a lot of complexity underneath the surface. And one of the problems with this is that you don’t get rewarded monetarily for having the very best thought-out device in the context of everything I just said. Monetarily, a lot of people get rewarded for having the first cool thing to the market. And the cool thing in the market may not be as safe or as secure or as well thought out as it needs to be. But the person who gets first-mover advantage by putting out the smart watch or the Fitbit or whatever is rewarded for doing that. Unfortunately, the implications of the Internet of Things and the unbelievable amount of devices that we are talking about should at some point cause people to step back and say, ‘It is really important to get this right.’ And I know there are a lot of people who are saying that. And it fundamentally ties most immediately back to the security of the environment, which I believe is the most critical impediment that we need to address.
Joshua Whitney Allen has been writing for fifteen years. He has contributed articles on technology, human rights, politics, environmental affairs, and society to several publications throughout the United States.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.