What Is the Most Underutilized Trend in Data Analytics? Q&A with Ken Kelley

by   |   April 12, 2017 5:30 am   |   0 Comments

Q&A with Ken Kelley, Associate Dean for Faculty and Research and Professor of Information Technology, Analytics, and Operations at Notre Dame’s Mendoza School of Business.

University of Notre Dame’s Mendoza College of Business (Chicago campus)

University of Notre Dame’s Mendoza College of Business (Chicago campus)

Statistical methods have always been an important matter for businesses. But in today’s world of big data and real-time analysis, certain areas of statistics are taking on a new importance – and other areas have been overlooked. This is something that Ken Kelley, Associate Dean for Faculty and Research and Professor of Information Technology, Analytics, and Operations at Notre Dame’s Mendoza School of Business, understands well.

Kelley teaches in several programs at the Mendoza School, including Notre Dame’s unique Master of Science in Business Analytics (MSBA) program, and has been named to Poets&Quants’ exclusive list of 40 outstanding MBA professors under 40. His expertise includes methodological issues and applied statistics, which he shares with students of the Mendoza School’s MSBA program in its two-part Statistical Methods for Managers course. Data Informed recently had the opportunity to sit down with Kelley and discuss new trends in statistics and old ones that deserve to come back. Read this insightful discussion of the state of statistics in today’s big data world and discover the benefits of the MSBA program available at the Mendoza School.

Data Informed: What areas of business statistics and analytics are you most interested in?

Ken Kelley (KK): In the world of big data, what we are often interested in is a behavioral outcome, whether it’s if someone purchases a product, how long they stay in a store, or how employee personality characteristics map onto employee effectiveness and other outcomes. I’m interested mostly in the area of analytics that measures and models behavior, such as predictors of why a person does something or thinks in a certain way. Many important measures are psychological, such as motivation, engagement, effectiveness at teamwork, et cetera can’t be measured directly. Rather, only manifestations of such latent constructs can be obtained and using such variables is different and contains measurement error. I like to focus on these sorts of issues and thus bring psychology and the methods used therein into the business analytics space.

I realize I approach analytics differently than some, such as those who work mostly with directly measurable variables via computer systems and online activity. I’m not doing that as much as looking at indirectly measurable behavioral factors. Nevertheless, much online activity is behavioral in nature. So to me, the things that influence a behavioral outcome, such as a psychological or personality attribute, are fundamental to many aspects of analytics.

Q: What underutilized trends in the analytics space could have the biggest impact on businesses?

KK: I think psychometric testing and measurement is currently underutilized by many. There’s certainly a movement of people using it, but I think more widespread use could make a big impact. For instance, trying to hire the right people for a job is something that could potentially be better performed by using data than by using an interview. An interview is so limited, and certain factors that don’t determine job performance are probably often given more weight than they should have.

There’s a 1954 book by the famous psychologist Paul Meehl called Clinical vs. Statistical Prediction that discussed how using statistical methods to predict recidivism rates of prisoners was more accurate than using the clinical, subjective method of a parole board evaluation. That idea of making decisions with behavioral statistics, which was first discussed in the fifties, is now something people are talking about in the analytics space – there’s a growing trend of using data instead of situational knowledge to make decisions. The book Moneyball by Michael Lewis presents a great example of this concept. It tells how the Oakland Athletics started using data rather than scouts to figure out which baseball players they should draft or trade for, with great results. These kinds of stories have been very enlightening for many people, but at the end of the day, it’s the same set of underutilized ideas that have been talked about since at least the fifties in psychometrics and behavioral statistics. For example, Michael Lewis also talks about the same sort of issues for basketball players in the Undoing Project, which itself speaks to issue of behavioral outcomes.

And it doesn’t just apply to sports or prisoners. This concept can be used in a variety of situations, such as companies that are trying to hire a lot of employees. Data could (and in some cases are) being used to automate the process of deciding who should be considered for an interview, or if interviews should even be considered at all, and to find the people not being hired that might have been hired based on the results of a person’s prior experience or the outcomes on a personality or other psychological test.

Q: Are there any interesting statistical tools or approaches that you think are going to be making a big difference in the enterprise?

KK: I think there are a lot of methods that get overlooked and that are not being used now, but that have the potential to make a difference. One of these things, for example, might be item response theory, which is a better way of measuring latent behavioral characteristics such as motivation than simply adding up the score of many items. We can easily know objective measurements, such as the exact time it takes for a truck to get from point A to point B, but when you’re talking about personality characteristics – conscientiousness, or neuroticism, or openness to expression, which are important variables in the context of who is or isn’t hired and later work performance – they’re not easy to measure and outside of the domain of many. But statistical methods have been developed to measure these underlying, latent attributes that we really care about. So I think incorporating more item response theory approaches to measurement and also latent variable modeling methods into personnel selection is something that could be very useful. And it’s not something that needs to be invented; it already exists.

Q: With such big hype around big data, is there still a place for small data in today’s business?

KK: Yes! I think small data has been lost in the shuffle and excitement of big data. Some data are not necessarily easy to collect and may require a lot of recourses.  Big data is not, in fact, always desirable. Embarking on a task that will collect huge amounts of data is not necessary for some questions. For example, small scale studies in which changes, treatments, or approaches for improved performance can all be extremely useful but are not in the realm of big data. It may not be ideal, for example, for a snack food company would change the way it combines its ingredients in a popular product, roll the new product out to lots of customers, and then collect big data to see how the perceptions changed (or did not change). Rather, in such situations it would often make sense do a smaller scale study to see what people prefer. And when a car company’s doing crash tests, you don’t necessary want to collect a large sample size for obvious reasons. The same goes for testing a new pharmaceutical product – you don’t want to give it out widely and then gather a large sample to see its effects, which could be negative. So if you think about data that’s hard to obtain, or situations when you want to evaluate a specific manipulation or treatment or approach, the data often is coming from a smaller sample size.

Now, there can be a combination of big data and small data where you have very rich data (e.g., many observations over some timeframe) but from a small sample size (i.e., relatively few individuals). If you have, for example, someone’s online behavior, then the sample size is just one person, but it could still be very, very rich data. So you could think of it as a small sample size in a big data context. And that concept is not talked about as much as it could be, I think.

People are so infatuated with huge sample sizes and automation – an instrumented world in which we’re collecting all kinds of variables from sensors, devices, or activities. But that can lead to collecting a lot of data that do not necessarily lead to value. Having a lot of observational data, for example, can’t reveal what’s causing this or that behavior, but with a standard clinical trial or randomized research experiment, we can discover causal outcomes without huge sample sizes. You don’t have to have big data or big sample sizes to learn something from data. In fact, in many situations a common question of statisticians is something along the lines of “how small of a sample do I need in order to address my question.” There are obvious resource implications when, for example, a sample of size N will be highly  likely to address the questions of interest and thus collecting a sample of, say, size 10*N or 25*N, is not necessary.

Q: What should a company focus on if it wants to combine the use of small data and big data?

KK: I think the idea of a randomized experiment has been overlooked in recent times because collecting data on the world as it is happening, as events are unfolding, has become a lot easier to do. So many organizations aren’t necessarily applying manipulation to try to quantify the effect of this or that, and a lot of companies, in my opinion, have lost sight of the value of what we learn from a randomized experiment. Gathering data observationally – just turning on a switch and then watching what a group of people do – is very different from watching what two groups, specifically with random assignment to group, do when only one group has had that switch turned on.

And companies can certainly do this with online data, some do. They can evaluate the effectiveness of a change in a website, for example, by randomly giving different ip addresses different views of the website or offers. But if you have a change without a control group, or a study design without randomization, we don’t know if the effect is really because of the change. So I think the old-fashioned idea of a randomized trial or randomized experiment can be incorporated into the big data world more than it currently is. The bulk of the hype that I hear about big data is more about collecting data that we weren’t able to collect before due to, for example, on-line activities or from instruments that easily collect information. But collecting observational data doesn’t mean the same thing as quantifying an effect from a specific variable.

Q: How do the courses you teach in Notre Dame’s msba program address the topics of behavioral data, small data, or just taking a more strategic approach to using data?

KK: The program considers learning from data and analytics from many angles. In the statistics courses that I teach, which is in the very first part of the program, we cover research design and analyzing the data obtained that way, and we do that as a precursor to, for example, multiple regression and other methods that are not necessarily methods used only for analyzing data from experimental designs. We view the statistics courses as foundational. It speaks to the underlying theory of learning from data and different ways in which we can learn from data.

Now, it doesn’t cover everything that I talked about earlier. We don’t discuss item response theory, for example. We don’t talk much about psychometrics. We do talk about measurement, however, because if you’re not measuring something very well, you probably won’t get good value from it. It is a bit ironic that many considerations of statistics do not consider much about measurement, yet it is the measures that are being analyzed with the statistics. The statistics courses essentially to help students keep in mind that when you want to make causal claims, you need to consider how the data was obtained. But there’s a lot more to the program. The statistics courses that I teach are really only small part of a very rich program.

Q: What do you think sets this program apart from other MSBA programs?

KK: I think that our huge advantage is that it is an MSBA program for working professionals that is taught in person. To my knowledge, we’re the only program that does it that way. Many programs for working professional are online, and you really lose out on the in-person instruction. The in-person nature of our program means that I can spend a lot of time outside lecture hours chatting with the students, which I believe the students value. I have lunch and breakfast with the students and spend breaks with them, and we talk about all kinds of issues they have at work and how they can use analytics that they’re learning in class to address those issues. So that is a big benefit of the in-person nature of our program. It isn’t just me, the other professors are engaged as well.

But it’s for working professionals, so this is not a residential program. Rather, it occurs over the weekend, all day Friday and all day Saturday where classes and analytics projects are the sole focus. It’s just one class for four or five hours, some breaks, and then another class for four or five hours; the students are totally immersed in what they are learning. And these classes are for professionals who actually have real problems that they’re trying to solve, so it’s a very different classroom experience than many other programs. The students are learning alongside people who are actively working to solve business problems, as opposed to students in a residential program who usually don’t have much experience dealing with data in a company. These people are already “in the trenches” dealing with data issues, and after being fully immersed in the program over the weekend, they take what they have learned back to the company and start implementing it right away. So we have a different audience and we have a different structure. In fact, a surprising numbers of students travel from around the country to our Chicago location. For example, students have traveled from LA, Los Alamos, Baltimore, Cincinnati, Atlanta, DC, and a variety of other cities. It is surprising to me, but also humbling.

 

 

Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.

 

 

Tags: , , , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>