Several years into the big data trend, it’s clear, says Thomas H. Davenport, that vast new datasets and technologies available to analyze them are not just for Internet startups. A recognized global expert on business analytics and knowledge management, Davenport, a Babson College professor and MIT research fellow, has co-authored and edited 18 books about business process innovation including “Competing on Analytics” (with Jeanne Harris) in 2007.
In his latest book due out in February, “Big Data @ Work: Dispelling the Myths, Uncovering the Opportunities,” Davenport gives managers a guide to big data technologies and how to create business opportunities from them.
Davenport talked with Data Informed about his new book and how companies are using big data in what he calls the Analytics 3.0 era.
Data Informed: Why did you write this book?
Thomas Davenport: I’ve written a lot of books around the topic of analytics, but I did some studies on big data and found there were some substantial differences from traditional analytics, and then I did a couple of other studies. And that’s my usual pattern; I do some research and write a book about it. But I do think there’s a huge opportunity for businesses to take advantage of big data.
What’s the distinction between analytics and big data?
Davenport: You can do analytics on small data — no one wants to call it small data, but unstructured, relatively small-volume data — or you can do analytics on big data. The two are somewhat different, I would say. In order to do the analytics on the big data, you’ve got to turn it into structured data first, and that takes a lot of energy from analysts to get it in the kind of shape where you can do some analysis on it, turning it into rows and columns of numbers. A lot of big data doesn’t come that way out of the box, it’s not in rows and columns and it’s not numbers, initially.
There’s a lot more unstructured data now.
Davenport: Yes, text, video, all that sort of stuff. In order to make any sense of it, you have to put it in structured form.
How would you explain to an enterprise why big data is important for its business?
Davenport: It potentially is able to transform almost every major activity in business. If your company has customers or moves things or has employees you’d like to know more about, any potential activity can be done with greater precision and optimization and tracking and so on with big data. It presents quite an opportunity for organizations to change what they do.
What industries currently are best leveraging big data?
Davenport: One of the reasons for writing the book is because most of the writing about big data thus far has been about the online industry. I think it’s fair to say that is still the earliest adopter and perhaps the most aggressive user of big data.
But there are a lot of others coming on strong. Transportation firms such as UPS and Schneider National are doing some interesting things with big data now. Health care, of course, is just moving into that space with the sensor revolution. While those companies have not used the data all that well yet, they’re starting to. And some of them have hired some data scientists to make sense of big data so that it’s of value to users. Banks, of course, have had big data for a long time, but haven’t really used it effectively. But Wells Fargo and Discover are starting. Telecom companies, particularly mobile telecom, have a fantastic amount of data they haven’t used particularly well. So there have been a lot of underachievers in various industries that have had a lot of data and haven’t used it well.
What are the key elements to developing a big data strategy?
Davenport: The first issue is deciding how transformative big data is going to be in your industry, and if should you pursue it aggressively or more conservatively. You also have to decide how and where you want to use big data. I’m working with a semiconductor company in the San Jose area. Do they want to use it in product development? Do they want to use it in customer relationships? There are a lot of potential possibilities for how you apply it. Decide your initial focus and eventually you’ll get around to doing a lot of things with big data, but you can’t do it all at once. So that’s one key issue.
And then there is what’s at the discovery stage, where we’re just trying to figure out what’s going on in the data. The production phase is another aspect of all of this. Those are some of the strategic decisions you have to make.
What are the key technologies you need for big data?
Davenport: One of the definitions of big data is data that’s too big to fit on a single server, so you are faced with this need to figure out how you split data across a variety of servers. And fortunately there’s this whole move toward big farms of commodity servers that are all very cheap and relatively easy to manage. You need software to do that split and reassemble the job when it’s done, and that’s where MapReduce and Hadoop came in.
While there are other alternatives, none of them are nearly as popular, and it’s just a revolution in what you can do with big data in terms of relatively inexpensive technology.
You have a chapter in the book titled “The Human Side of Big Data.” What sort of skills do enterprises need for big data, and where do they find people with those skills?
Davenport: I think the classic data scientist profile is fairly well understood: the person who has computational capabilities, analytical capabilities, the ability to understand the business and communicate what’s going on to managers. Those were the sort of people that were sought by the first wave of big data users, online firms in particular. And they’re still seeking them and there’s still a shortage of them.
But when I talked to big companies, it appeared they were not in a hurry to hire tons of data scientists. They thought maybe they could get by with some of their existing capabilities, rely more on teaming rather than thinking you can find all of these things in one person. Maybe hire one Hadoop specialist, maybe hire one person who knows more about data visualization than any other topic. But not this sort of all-seeing, all-knowing data scientist. They don’t seem to be as worried about it as the dot.com firms, they seem to think they’ll be able to find what they need. And there’s some logic to that because the number of university programs, while slow to develop, is really mushrooming now. They’re just sprouting up all over the place, analytics and data science programs.
Some enterprises look at current employees as potential data analysts or scientists. What sort of innate skills would an internal candidate need?
Davenport: You have to have some familiarity with math and statistics. You can train people on that to some degree, but someone who is mathematically challenged probably isn’t going to work out. The good news is there’s plenty of training available online and offline.
Then there’s computational ability, the ability to create creative computing solutions. I call it data fluffing. While it’s creative, it struck me as being kind of dirty work, extracting data from devices that weren’t designed to give it up; manipulating it in various ways to make it analyzable is quite difficult and challenging.
The necessary business and communications skills are fairly widely available. The thing that there’s some uncertainty about is do your big data people need the kind of experimental attitude that the early data scientists had. A lot of them were Ph.D.’s in scientific fields, with physics the most common field, particularly experimental physics. But do you really need that sort of thing as big data matures? I don’t think we really know that well enough at this point.
Is big data leading to a new approach to managing analytics?
Davenport: Yes. That’s addressed in the last chapter of the book. I call it Analytics 3.0. Analytics 1.0 is traditional back-office flow, analytics on data in the data warehouse. And 2.0 is the hardcore big data stuff, the classic data scientist profile, mostly in online firms.
When I talk to big companies about how they’re using big data, they have a couple of goals, one of which was derived from Analytics 2.0, the big data era, which was developing big data products. So you have companies such as GE putting all these sensors in their industrial devices and they’re developing products and services based on that data. Insurance companies like Progressive, which are quite good at analytics, now are generating all this big data from the snapshot boxes they put in cars, all the details about how you drive and so on.
The other goal is decision making at scale. Analytics has always been about improving decision making, but in the 3.0 era it’s about doing it on a very large scale. Maybe it’s doing it on the front lines of an organization such as UPS, with its real-time routing based on sensor data and package data, anything they can pull together to develop a perfect route for a particular driver on a particular day.