Evolutionary pressure has made us visual beings. I’m alive today because my ancestors saw things to eat and saw things that might eat them. Because we respond so strongly to visual cues, charts and graphs have the power to move us in a way that other ways of presenting data can’t match. Astute researchers have known this for a long time. Florence Nightingale visually presented her analysis of British deaths in the Crimean War (1853-1856) because she knew politicians and civil servants wouldn’t read or understand written statistical analysis. Her approach worked – she got the money for proper field hospitals.
But there’s a dark side to the power of charts. Charts can mislead us into believing things that aren’t true. Sometimes this is accidental, but other times we are being deliberately manipulated. Sometimes it’s easy to spot what’s wrong, but other times the sleight of hand is very subtle.
I’m going to show you some of the tricks of the trade. It’s based on my own experience and it’s idiosyncratic, so please feel free to comment.
The most notorious of the data visualization deceiver’s tricks is to use chart axes that don’t start at zero. We’re very good at comparing the lengths of objects, so choosing a non-zero axis can greatly magnify small or meaningless differences. The chart on the left is very commonly used in the UK to show the change in house prices – I won’t name the guilty publications.
By not starting the axis at zero, I’ve greatly magnified what’s actually a small change in average house prices (a 2 percent change 2012 to 2013), as you can see from the chart on the right, where I’ve started the axis from zero. If you want to make a small difference look like a big one, this is the method for you.
Occasionally, starting from zero can be misleading. The chart below left shows a patient’s body temperature over the course of a day. By starting from zero, the chart below left disguises or minimizes the patient’s temperature change (imagine how the chart would look if I’d shown it in Kelvin). But a change of a few degrees can be very medically significant – in fact, a change of 2.5 degrees C can lead to death. But it would be hard to spot a 2.5 degrees C change in the chart on the left. In cases like this, it’s better to use a non-zero axis, as I’ve shown on the right.
A slightly less heavy-handed manipulation is to play around with the axis scale. We’re conditioned to think that an axis scale is uniform. Using a non-uniform scale can mislead people by creating an impression of sudden change when none exists. The example shown below was actually used on TV (again, no names to protect the guilty) to imply that gas prices were undergoing a significant change. The sleight of hand here is to use an unusual, and non-uniform, X axis. The underlying data series of U.S. gas prices shows considerable variation over time and tells a story very different from the one implied by this chart. This chart would be slightly less evil as a bar chart – the line chart strongly indicates a continuous series; the bar chart better indicates categorical difference. But even as a bar chart, this chart would be highly misleading and dishonest.
Also, a variation of this technique can achieve the exact opposite effect. Years ago, one of my physics lecturers told me that using a log-log graph would straighten any line and hide dodgy experimental results.
A more subtle problem is misleading correlations. No matter how many times our statistics professor told us, “Correlation is not causation,” we are still tempted to make the wrong mental leap. Imagine you saw the chart below – which is real data. The correlation coefficient here is 0.97. What would you think? Would you think there’s some relationship here? What if I told you Line B was sales revenue and Line A was marketing expenditure. Would you conclude that to increase sales you should increase marketing? What if this chart was shown on the TV or in a newspaper? What would the audience conclude?
Actually, I lied to you. The chart is really “Number of people who died by becoming tangled in their bed sheets” (line A) and “Total revenue generated by skiing facilities” (line B). I took the data from the wonderful website Spurious Correlations. This website shows correlations between the strangest sets of data – it’s well worth wasting an hour of your employer’s time to look at it. The website hammers home the point that correlation does not equal causation. Would you really conclude from this chart that you should increase deaths by bed sheet to increase skiing revenue?
I love maps, so it pains me to say that geographical charts can very strongly mislead. Here’s a map of the Scottish independence referendum results by electoral district. Green is against independence and red is in favor (by the way, blue is the national color of Scotland, that’s why I chose green and red for the colors – even color choice can mislead). Looking at this chart, what do you think the result was: 90 percent against, 80 percent against, or 70 percent against? Actually, the result was 55 percent against. The chart is misleading because the Scottish population is highly concentrated in geographically compact urban areas. We seem to be conditioned to think that the greater an area something consumes on a chart, the more important it is – but this just isn’t the case for a geographical representation.
There’s a further problem with this presentation: I’ve chosen two colors with no gradation. In reality, the result varied from district to district. I could have chosen a gradient scale, but the chart is still highly misleading. This is a perennial problem with visualizations of U.S. Presidential election results – a glance at any election results map always seems to indicate a Republican win because rural states with small populations tend to vote Republican (Montana, for example, has 0.3 percent of the population, 0.5 percent of the electoral college votes, but 3.8 percent of the area). To me, maps are the most powerfully misleading of all charts, so I don’t use them.
Of course, there are many more examples of misleading visualization techniques. These are just a few of the ones I’ve come across. I’d love to hear your experiences – have you come across other misleading techniques?
Here are some simple rules I use to keep me virtuous.
- Always start your plots from zero, unless doing so would be misleading.
- Use a linear axis scale – avoid different sized categories and log plots unless there are good reasons to do otherwise.
- Never, ever forget that correlation is not causation. No matter how tempting it is, don’t do it. Bear in mind that your audience will almost certainly see correlation as equaling causation, so be careful.
- Maps are beautiful, but they can be powerfully misleading. Never use them alone and always consider the unintended message you might be transmitting.
Visualizations are extremely powerful. They can mobilize governments to save lives or mislead us to vote for the wrong government. You and I are the people who produce visualizations for others to use. To quote Spider Man, “With great power comes great responsibility.”
Related on Data Informed:
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.