The Data Science Hierarchy of Needs

by   |   October 5, 2016 5:30 am   |   2 Comments

Steven Hillion, Co-Founder and Chief Product Officer, Alpine Data; and Kaushik Das, Head of Data Science, Pivotal

Steven Hillion, Co-Founder and Chief Product Officer, Alpine Data; and Kaushik Das, Head of Data Science, Pivotal

Enterprises have had it drilled into them that they need a “big data strategy.” But the vast majority are unsure of the appropriate steps they need to take and many haven’t even seen a significant impact from big data investments. According to a recent McKinsey Global Survey of executives from a range of industries regarding their analytics activities, 86 percent of them say their organizations have been “at best only somewhat effective in meeting the primary objective of their data and analytics programs.” Moreover, one-quarter say they’ve been “ineffective.” Forrester has reported that “while 74 percent of enterprise architects aspire to be data-driven, only 29 percent say their firms are good at translating the resulting analytics into measurable business outcomes.”

We believe a key part of the problem stems from misplaced big data priorities. There’s a logical hierarchy of needs for developing a big data strategy, and it is one we would prescribe for virtually any organization: first, define the business problem you are trying to solve. Second, find the data you need. And third, assess or create the infrastructural needs for the project.

The critical point here is that defining the infrastructure comes last, not first; yet too many organizations we have seen behave as if the order should be inverted. They’re focused on data science wizardry and technical innovation when they should be focused on the basics first. Too often, we see the technical cart ahead of the business horse. Analytics teams are frozen, obsessing about which algorithm to use. IT teams are unwilling to green-light projects until security, data models, data storage and long-term architecture are agreed to. Less often do we see these groups working closely with the business asking, “What’s the purpose? Why are we doing this? How can we get a good-enough solution to production in the next few weeks?”

Don’t let the infrastructure drive the data science project

The urgency to establish a big data strategy propels companies into the third step of diving into infrastructure without a clear directive. You start with installing Hadoop, maybe some Kerberos and SSO for security, even a NoSQL Database just for fun. However, if you start with the business problem, you may find that you don’t need any new infrastructure or complex machine learning at all. A good rule of thumb is to employ the YAGNI (You Ain’t Gonna Need It) principle from agile programming. Start by doing the simplest thing that could possibly work. A lot of problems can be solved by smart people armed with modest data and tools designed with the business in mind, before ever requiring a big data infrastructure.

For example, the wealth management business at a large bank started with Hadoop infrastructure. But after identifying a high-priority use case, the team realized all they needed was a traditional database and a standard set of advanced analytics tools to get up and running. In the end, the most effective thing they did was to 1) identify an important business problem; 2) iterate rapidly on an analytics solution using the most basic algorithms; and then 3) focus on how to integrate that directly into the wealth manager’s’ daily workflow.

With data science, complexity does not equal better

Sometimes even when you start with a business problem, you can get mired in the science, and get too excited about innovation in machine learning. This can lead to complex evaluations of specialized algorithms and technologies that will not get you any closer to a business solution. For organizations looking to quickly turn data into action, the saying “don’t let the perfect be the enemy of the good” is an apt one. Frequently the most traditional machine learning techniques like regression and clustering are enough to solve a business problem, while highly complex algorithms only add to analytic cycle time and expense — without adding value. This is not to say that there isn’t a place for more sophisticated analytic models, but sometimes organizations waste a lot of time and money building complex models when a simpler approach would do.

For example, the Wall Street financial “stress test” can seem like a daunting requirement. In one case, a bank wanted to use overly-complicated machine learning techniques on a very small dataset. But after exploring the bank’s critical needs, it became clear that a more straightforward approach with more comprehensive data would have been much more appropriate.

Focus relentlessly on business outcomes

It’s important to remember that big data and analytics, in and of themselves, are not strategies. Rather, they are enablers that empower the organization to achieve business objectives by implementing enhanced or innovative strategies. Be thoughtful and be systematic about how you incorporate data science into your business.

 

Steven Hillion is co-founder and Chief Product Officer at Alpine Data. Steven has been leading large engineering and analytics projects for 15 years. Before joining Alpine Data, he founded the analytics group at Greenplum, leading a team of data scientists and also designing and developing new open-source and enterprise analytics software. Before that, he was Vice President of Engineering at M-Factor, Inc. (acquired by DemandTec), where he built analytical applications that became a global standard for demand modeling. Earlier, at Kana Communications, Steven led the engineering group during the two largest releases of its flagship product. At Scopus Technology (later Siebel Systems), he co-founded development groups for finance, telecom, and other verticals. He received his Ph.D. in mathematics from the University of California, Berkeley, and was a King Charles I Scholar at Oxford University.

 

Kaushik Kunal Das is the Senior Principal Data Scientist with Pivotal. His job is to formulate data science problems and solve them using the Pivotal Big Data Platform. He leads a team of highly accomplished data scientists working in energy, telecommunications, retail and digital media. Kaushik has an engineering background focused on solving mathematical problems requiring large datasets. He is interested in questions such as, “How much can a company know their customer and customize their actions in a context-sensitive fashion?” and, “How can our living and working environments get smarter and how can we get there?” Kaushik studied engineering at the Institute of Technology of the Banaras Hindu University and the University of California at Berkeley.

 

Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.

Tags: , , , ,

2 Comments

  1. Bill Duncan
    Posted October 6, 2016 at 12:37 am | Permalink

    Wonderful! You capture my personal sentiment precisely when you say: “Focus relentlessly on business outcomes”

  2. Dan Reznik
    Posted October 10, 2016 at 6:36 am | Permalink

    Excellent article restating the mantra that business value comes first.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>