There has been a massive amount of writing—including one piece from me—about the failure of U.S. political polling and pollsters to predict the election of Donald Trump as U.S. president. Those prognosticators’ predictions in 2008 and 2012 were quite accurate, but something broke down this time. This situation is to me a really interesting one that has potential applications far beyond this election (though that is certainly one of the important instantiations of it.)
The general problem is this: Let’s say that you have been successfully predicting something (winners of an election, your customers’ wants and needs, your financial performance, or your daily weight loss from 10,000 steps) for a good while. All of a sudden, your data and models no longer seem able to predict the phenomenon in question. Weird outcomes happen. You and everyone else are shocked. What can we do to avoid this situation?
The simplest answer to this question is to learn from the bad experience. The best way to know that a model needs to be changed is to find out that it no longer predicts well. Of course, this approach is less than appealing when the event you are predicting happens only every four years—as in certain elections—and when your reputation depends on a successful prediction of that event. In business analytics, you may want to arrange regular tests of your models—say, on predictions of customer responses to promotions—to make sure they still work. In political analytics, you might want to test your models in primaries, off-year congressional and gubernatorial elections, and so forth.
Another important approach is avoiding over-reliance on one type of model or data. The world provides a lot of different signals about what is likely to happen, and you don’t want to focus on only one type. If, for example, you have primarily employed customer loyalty program data to understand your customers, you might also assess social media sentiment, analyze comments left on your website, or even turn call center speech into text and analyze that. You might even, God forbid, organize a focus group. None of these methods are highly accurate means of assessing what people think, but they can provide insights that other types of data miss.
If political polling analysts, for example, had employed some of these alternative analyses, they might have detected that polls were not reliable guides to action in the most recent election. Certainly there was plenty of social media to analyze. The Trump campaign, for example hired UK-based Cambridge Analytica to analyze social media and emailed and phone surveys to understand the psychological profiles of potential Trump voters. The company identified a group it called “disenfranchised new Republicans,” which (in an article on Bloomberg Business) “are younger, more populist and rural—and also angry, active, and fiercely loyal to Trump.” The campaign analysts agreed with most analysts that polls showed Trump losing to Hilary Clinton, but they held out hope that this group would turn out in large numbers—and it did.
Constant checks on your data quality are another means of ensuring that your current understanding of the world is accurate. In election polling, the widespread assumption is that while individual polls may be problematic, aggregating them is a viable means of addressing quality problems. This is a valid assumption unless there are systematic bias factors in many polls—as there seemed to be in the 2016 presidential polls. Since polls reflect reported future behavior rather than actual behavior, they are highly subject to quality problems.
In business-oriented data, quality problems are usually somewhat easier to address. This typically involves examining a sample of data, comparing it to acceptable ranges, and correcting factors in the business process that might be causing problems.
It’s also important to record and revisit regularly the assumptions behind your models. Every model has assumptions. In politics, polling-based models assume certain levels of turnout for particular groups, and assume voters know for whom they are going to vote and that they are truthful about their intentions. In business, the assumptions may involve ideas about which customers are the most desirable, who is most likely to pay back a loan, or the stability of your supply chain for inventory optimization purposes. It’s easy to forget these underlying assumptions over time and when models are working. So revisiting and examining them needs to be done on a regular basis—perhaps a couple of times per year. If the assumptions are no longer valid, the models that depend on them will have to be redeveloped.
At some point you may need to observe the objects in question at close range. We have great systems for automated telephone polling and web surveys, but they may not always elicit a voter’s true feelings. It may require deeper study. For example, Diane Hessan, a former Boston-area technology executive who volunteered for the Clinton organization, interviewed 300 undecided voters and persuaded most of them to keep diaries about their feelings. She wrote in a blog post after speaking with…
scores of undecided voters in swing states. They didn’t like either candidate. They just wanted to be understood. At the end of the day, they cared less about Trump’s temperament and more about whether he “got” them. They were smart, they knew the cheers, Trump gave them a voice, and he certainly didn’t think they were deplorable. I didn’t hear this from everyone, but it was striking to read the comments of voters who were struggling to make a decision, and who went with the candidate who made them feel important. It might have been enough to make 70 electoral votes’ worth of difference.
In business, actually observing customer or employee behavior might involve visiting retail stores and talking with either group, focus groups (though, as I suggested above, they are often not reliable), or observing people as they interact with your products or processes. The longtime leader of General Motors, Alfred Sloan, used to take “field trips” to cities around the US, during which he would meet with between five and ten GM dealers a day. He found it a useful means to check on product quality, company strategy and communications, and relationships with dealers and customers.
If you don’t want to be blindsided the way the Clinton campaign (and much of the media) were, you might want to undertake some of these steps. Analytics are an important way of finding out what’s going on in the world, but they are only effective under certain circumstances. It’s pretty important that you figure out whether those circumstances are present or not.
Tom Davenport, the author of several best-selling management books on analytics and big data, is the President’s Distinguished Professor of Information Technology and Management at Babson College, a Fellow of the MIT Initiative on the Digital Economy, co-founder of the International Institute for Analytics, and an independent senior adviser to Deloitte Analytics. He also is a member of the Data Informed Board of Advisers.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.