4 Rules for Planning Big Data Projects

by   |   August 16, 2012 5:53 pm   |   0 Comments

Glatfelter Insurance Group is growing, and so is its data. “As we acquire or write new business, that’s more customers, more policies, more claims,” says Dave Zapcic, director of enterprise data management. But “in order to gain a competitive edge, we need to look outside of the core data and marry it with what’s being labeled big data.”

With 6 or 7 terabytes of data already in hand, Zapcic is crafting a roadmap for going big—with social media, industry benchmark information, even financial market data.

There are many organizations like Glatfelter, and many executives like Zapcic, who are anticipating they can derive business value from new data sources. Unlike traditional business intelligence (BI) investments,  the best practices for planning big data projects are evolving, along with new tools and analytics techniques. Here are four ways such projects are different from traditional BI.

1. Many business functions are involved

BI projects are often designed for a limited set of users who have a defined set of questions in mind. Big data, whether it involves mining doctors’ notes, weather data or Twitter chatter, potentially benefits multiple business areas, says Douglas Laney, vice president of research for business analytics and information management  with Gartner.

“If you’re going to continue to throw your basic BI tool at a big data source, the value you’re going to get from that analysis is not generally going to outweigh the cost of acquiring and administering that data,” says Laney. Big data opens the door to more open-ended questions, but the return on asking them—and which area of the business the answers impact—may not be completely apparent until you see what the data reveals. When planning big data projects, it pays to involve as many business leaders as possible.

Glatfelter, which sells business and personal insurance in the United States and Canada, is getting ready to roll out a tool that integrates current weather data with information about its policyholders. Zapcic and his team developed the tool to help risk managers see, in real time, which policyholders might be hit by a hurricane. Then they could contact local brokers who in turn could advise customers how to protect their property. The project potentially helps underwriters, too, enabling them to assess risk more accurately.

When a user requests a report, Zapcic has a process to ensure that his team considers whether other business functions or divisions might be able to benefit from it. Such collaboration will be critical as the company explores new data sources, he says.

2. You’ll do many more prototypes

Most companies pilot new technology before putting it into production. But big data demands continuous prototyping. For analysis to be relevant, it has to get done quickly. “You can’t go off and come back six months later with a product,” says Mike Cocchi, managing director, enterprise information management with Blue Cross and Blue Shield of Rhode Island. “It’s almost like iterative development, where the business is actively engaged.”

“A lot of people in analytics are starting to develop agile methods,” says Thomas Davenport, a professor of management and IT at Babson College and a visiting professor at Harvard Business School. If analysts can’t fully anticipate where their queries will lead them, they’ll do a prototype and refine it based on what they find.

A “sandbox” for prototyping serves three purposes, says Gartner’s Laney: determining whether it makes sense to integrate disparate datasets; defining new analytic models and answering one-off questions.

At Glatfelter, Zapcic is building an “innovation lab” with a fourth aim in mind: to educate users about what new tools—and new data—can tell them that’s different from what they’re used to getting with existing data and reports.

Rather than developing requirements first, as you would for a traditional BI project, then building a business case for the tools and data needed to fulfill them, it makes more sense to let users experiment. Open source tools make this cost-effective, says Boris Evelson, vice president and principal analyst with Forrester Research. “You can’t build requirements and a business case until you know what’s out there. Once you explore and discover something useful, then you can build your use case and business case.”

3. It’ll Change Your Technology Stack

Chances are, with big data will come new investments—in data management platforms, analytics tools or, at the very least, more storage—some of it, possibly, in the cloud. “It’s too much to manage here,” says Zapcic.

Right now, says Davenport, “it appears to me most organizations doing big data work are doing it in parallel rather than an integrated architecture.” But unless you think you’re going to ditch your legacy data warehouse (unlikely), it’s important to think about how those technologies will impact your current environment.

“Traditional warehousing is alive and well,” for managing structured data used in business operations, says Cocchi. When you want to bring that structured data together with new, unstructured data for deeper analysis, today’s boutique big data tools might end up creating an expensive mess.

“The paradigm shifts every 4 to 8 years, and you can’t shift your technology stack every time that happens,” says Cocchi. “You have to decide whether to use the best-in-class tool at that moment, or use a technology stack that integrates well, even if it’s not the best tool on the market.”

4. Don’t Forget the Programmers

BI is a mature technology, with plenty of automated tools to help developers and users analyze data and build reports. Not so for big data analytics. It’s too new.

That makes using big data labor intensive, for now. “Experimentation always requires hands-on work,” says Evelson.

Cocchi observes that graphical user interfaces to Hadoop-based systems “aren’t as sophisticated as they’ll be in three years, so there’s a big amount of heavy lifting in IT to create an experience for the end user to leverage.” You might have to dedicate some programmers to writing queries.

Up to now, Blue Cross and Blue Shield of Rhode Island focused on analyzing data from some 1 million claims per month. Now, however, Cocchi and his team are working at mining data from health care providers’ electronic health records to uncover emerging problems, so doctors can treat them before they result in expensive claims. A lot of that data is unstructured. “If you want to search for broken arms, and you search ‘broken arms,’ you may find only a fifth of the results because the doctor wrote ‘fractured arm,’ or ‘disarticulated arm.’

Programmers have to understand the underlying business problem they’re trying to solve in order to deliver the right information. “You can’t take for granted that the same people who are good at building a traditional warehouse are good at mining unstructured data,” Cocchi says.

Elana Varon is contributing editor with Data Informed. Tell her your stories about leading and managing data driven organizations at elana.varon@cochituatemedia.com. Follow her on Twitter @elanavaron.

Tags: ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>