How to Tell a Story with Millions of Rows of Data

by   |   March 4, 2015 5:30 am   |   0 Comments

Ellie Fields, Vice President of Product Marketing, Tableau

Ellie Fields, Vice President of Product Marketing, Tableau

How many stories does New York City hold? Eight million, says the famous film “The Naked City.”

“That’s a lot of data,” say people in our world.

Data storytelling is something that people who work with data are talking about quite a lot. The success of news outlets focused on data journalism – FiveThirtyEight, Vox, and The New York Times’ Upshot, for example, is proof of this.

The idea is that just as you can tell a story through prose, or images, or video, you can tell a story through data.

“That’s great,” some say, “but that’s for small data sets. I’ve got to deal with big data.”

This, however, is a fundamental misunderstanding of the concept. In fact, stories become an even more important medium for data as the size of the data increase.

Related Stories

Tableau Software’s Robert Kosara on Using Data to Tell a Story.
Read the story »

Michael Lewis Talks Data, Storytelling at Tableau Conference.
Read the story »

Tableau’s Story Points Puts Analytics Power in the Hands of LOB Users.
Read the story »

Tableau CEO: Data Scientists Are Like Artists.
Read the story »

You see, telling a story with data isn’t like creating an infographic. Where an infographic typically uses a few data points as part of a static image, woven together with some pictures, a great data story is much more. A data story is a way to express a narrative built on data while allowing readers to test assumptions and explore their own threads. And the larger the data set, the greater the value of a narrative that helps us distill what’s important in all that information.

Data storytelling has been used effectively in a range of fields, from the urbanization of China to political ad analysis to manufacturer’s recalls.

Let’s take an example of how to tell a story with millions of rows of data from New York City itself. New York City has made its taxi data available. For a single year, there are more than 173 million rows, where each row represents a customer’s trip in a taxi.

How do we tell a story from that mass of data?

Explore the Data

The first thing to do is to explore the data. Ideally, you can use a fast, interactive tool so that you can test out different hypotheses about the data.

In the taxi data, a simple time trend lets us see that there’s a dip in rides in the summer months, presumably the time when New Yorkers flee the heat of the city. We also see a dip around Christmas.

Number of cab rides in New York City, by date. Click to enlarge.

Number of cab rides in New York City, by date. Click to enlarge.

 

There’s also a clear weekday trend. Let’s look at that next. It seems that New Yorkers like to cab it more later in the week. Summer is the only exception to the pattern of fewer rides on Mondays.

Number of cab rides in New York City, by day of the week. Click to enlarge.

Number of cab rides in New York City, by day of the week. Click to enlarge.

 

Next, let’s look at tipping. Here’s a real trend: Tipping is highest at the end of the year. Is it due to bonus time?

Cab driver tipping percentage in New York City, by day of the week. Click to enlarge.

Cab driver tipping percentage in New York City, by day of the week. Click to enlarge.

 

Create a Narrative

By now, a story is beginning to form in your mind. Just like your sixth-grade teacher taught you, you begin to create a narrative that tells the story on New York in summer – the city clears out, and those left are going out more.

Now that you have a narrative of your data, you can add interactivity to it. Your readers can find other stories in the data. Perhaps they want to dig into what’s happening in the winter in the city, according to the taxi trends.

Interactivity is a critical element in big data. It lets people follow new threads and it lets them test your assumptions – lending credibility to the story and to the data itself. Put together your narrative and the ability to explore, and you give your readers a data story and a way to learn more.

Just as there may be 8 million stories in The Naked City, there may be more stories in the data about its 173 million taxi rides.

Storytelling with data is how we get meaning from big data and, as data become a more and more important part of our lives and our work, data storytelling might be the best way to engage people with data. Without a story to tell, test, and discuss, big data is just an overwhelming mass of numbers.

Ellie Fields is the Vice President of Product Marketing at Tableau. Ellie joined Tableau in the early stages of the company in 2008 and is part of the core team that has fueled the growth and success of Tableau’s products. In particular, she launched and oversees the development and growth of Tableau Public, which has served more than 200 million impressions. Tableau Public is a unique product, popular among journalists, developed for anyone that wants to easily publish interactive data on the web. Ellie also helped launch the company’s new cloud analytics product, Tableau Online. Prior to Tableau, Ellie worked in product management at Microsoft and as an associate in late-stage venture capital. Ellie hold B.S. and B.A. degrees in Engineering from Rice University and an MBA from the Stanford Graduate School of Business.


Subscribe to Data Informed
for the latest information and news on big data and analytics for the enterprise.






Tags: , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>