By now, everyone knows that “Sexiest Job of the 21st Century” is the data scientist. The Harvard Business Review made that declaration in 2012 and the race to become (or rename) oneself as a data scientist was on, at a feverish pitch. Bringing in data scientists with a strong educational and professional background is important, of course, but training the business in how to properly use the data scientist is critical as well. Otherwise, it would be like having a fancy new Tesla but not taking the time to learn how to charge it up.
So how does an organization get the most out of their data scientists? How do we alleviate the shortage of data scientists that threatens to stymie the business and societal benefits that these unique folks can bring forth? How do we ensure that data scientists are focused on helping the organization drive financial or business value (as opposed to publishing articles or speaking at conferences, a complaint that I have received more than once)?
Let’s start that discussion with some basic but important definitions.
What Is Data Science?
To get more value from our data scientists, we first must understand “What is data science?” The best definition of data science comes from the book Moneyball: “Data science is about identifying variables and metrics that might be better predictors of business performance.”
That’s a very simple description, but let’s deconstruct it anyway.
- Identifying variables and metrics. The data science process must be driven by a creative and curious mind in order to identify and brainstorm the variables and metrics (data sources) upon which to focus the data alignment, transformation, enrichment, and visualization efforts.
- Better predictors. The focus of the data scientist is on predicting what is likely to happen and prescribing what actions to take (versus reporting on what has already happened). This requires a thorough understanding of the decisions that must be made in support of the organization’s business initiatives.
- Business performance. The key deliverable must improve business or financial performance in order for the data science work to be relevant and meaningful to the organization.
As detailed in Moneyball, the Oakland A’s discovered several variables that were better predictors of the value of a baseball player – for example, that on-base percentage was a better predictor of a hitter’s value than batting average.
Finding The Next On-base Percentage
Organizations must help their data science teams to find that next more predictive, on-base percentage kind of variable. And the key to doing this actually lies with the business users, not the data scientists.
Table 1 highlights the roles that business users and data scientists play in collaborating to fully exploit the power and potential of data science.
|Data Science Objectives||Business User Responsibilities||Data Scientist Responsibilities|
|Identifying Variables and Metrics||Business users are best positioned to brainstorm variables and metrics (data sources) that might yield better predictors of business performance because they live the job every day and probably have a good idea of the different variables that they would like to test||Data scientists are responsible for gathering and ingesting the data from wherever it may be located, using the most effective data-acquisition techniques and then applying different data transformation and enrichment techniques to prepare the data for analysis|
|Better Predictors||Business users are responsible for determining which of the analytic results coming from the analytic modeling processes pass the Strategic-Actionable-Material (SAM) test.||Data scientists use data visualization techniques to better understand the interplay of the data, such as identifying variables that tend to move together under certain situations, or outliers in the data that may be indicative of something useful.|
|Business Performance||Business users own the identification of the decisions that the business is trying to make with the data science or analytic results in support of the organization’s business initiatives. Ultimately, the business users will tell you what is working and what is not working.||Data scientists build the analytic models that quantify cause-and-effect using a wide variety of analytic modeling algorithms. They then determine the quality of fit for those models. This process typically requires many iterations and provides many opportunities to learn from failure.|
Table 1. Roles of business users and data scientists.
The Power of Business Decisions
To ensure that the organization is getting the most value out of its data science operation, focus on the business decisions. These decisions the linkage points between business users and data scientists (ensuring that everyone is focused on the same objectives), and it’s around these decisions that the collaboration between business users and data scientists can deliver the most business value (Figure 1).
The decisions are key because:
- From a top-down perspective, decisions provide the framework around brainstorming the necessary variables and metrics (data sources), and also dictate your architecture and technology requirements.
- From a bottom-up perspective, the analytic models built from the different variables and metrics (data) create the analytic results (e.g., scores, recommendations, business rules) that will be applied to optimize the decisions that support the organization’s business initiatives.
No data science initiative should exist in a vacuum. The collaboration between business users and data scientists is central to optimizing business processes, uncovering new monetization opportunities, and realizing the most value from your big data analytics investment.
Bill Schmarzo is responsible for setting the strategy and defining the service line offerings and capabilities for the EMC Consulting Enterprise Information Management and Analytics service line. He’s written several white papers and is a frequent speaker on the use of big data and advanced analytics to power organization’s key business initiatives.
Bill has more than two decades of experience in data warehousing, BI, and analytic applications. Bill authored the Business Benefits Analysis methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.
Previously, Bill was the vice president of Analytics at Yahoo, where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of actionable insights through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-leading analytic applications.
Bill holds a master’s degree in Business Administration from the University of Iowa and a bachelor of science degree in Mathematics, Computer Science, and Business Administration from Coe College.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.