Open Source Data Visualization Tools Still Require Specialized Skills

by   |   March 12, 2013 5:01 pm   |   1 Comments

An interactive visualization by Lynn Cherny using NodeBox. Cherny mapped every noun in Stephenie Meyer's novel "Twilight" by their gender. Since English nouns don't have grammatical gender markers, she used a library by Google's Shane Bergsma that scraped news stories for male, female and neutral pronouns associated with nearby  nouns. Here she is mousing over "receptionist," which is most often associated with females, according to Google.

Lynn Cherny created an interactive visualization using NodeBox to assess the gender orientation of nouns in the novel “Twighlight” using data compiled by  Google. Above, the distribution of data points shows more nouns have a male orientation than female. Mousing over “receptionist” finds the word associated with females.

Creating a data visualization that reveals interesting relationships between data isn’t easy. Even if the data is clean and of high quality, the visualizer must mine the data for correlations and find the best way to represent them to his or her audience.

There are several enterprise level tools available, like Tableau or QlikView, that specialize in dashboards and reporting but have the capabilities to get more advanced. Those tools have expensive enterprise licenses to consider.

Lynn Cherny

Lynn Cherny

Lynn Cherny, a data consultant based in Massachusetts, said she often does her more advanced data exploration and visualizations on open source tools, like NodeBox for Python or the Java-based program Processing, or the visualization library d3.js. Cherny is giving a presentation on NodeBox at PyData 2013 in Santa Clara on March 19.

Cherny said open source tools like NodeBox still have some maturing to do, but because they’re community-driven advances can happen more quickly than with enterprise tools. She said she prefers NodeBox to Processing because she strongly dislikes coding in Java, and Python is much easier.

But, she said, the ability to code is still an important skill in order to explore data visually and lacking this prowess can be a barrier to entry to creating robust visualizations.

In this interview with Data Informed staff writer Ian B. Murphy, Cherny discusses the gap between enterprise and open source data visualization tools, the growing community for Python as a data processing and visualization tool, and the process involved to create a good data visualization from start to finish. (Podcast running time: 19:22.)

Email Staff Writer Ian B. Murphy at Check out other Data Visualization podcasts from Data Informed.

Related articles on Data Informed:

Python Brings Simplicity to Big Data Analytics

Visualization Experts: Data Needs Context and Clarity to Connect with Audience 

An Interactive Map Visualizes Great Lakes Water Currents 

Tags: , , ,

One Comment

    Posted March 21, 2013 at 3:04 pm | Permalink

    Very interesting podcast on developments and opportunities for open source visualization tools. We’re also finding that demand for simple, yet effective ways of visualizing large amounts of data is on the rise within organizations of any size. Aided by rapid development and no barriers to adoption, open source is excellent at meeting this need.

    Many of our enterprise customers are struggling with how to easily, economically and quickly ingest, manipulate and visualize large sets of structured and unstructured data sets. That’s why we recently unveiled 12 useful visualizations-open source plug-ins available for download and experimentation. From Sunbursts and Tree Maps to Funnels and Packed Circles, this series of plug-ins is ideal for developers testing out ways in which open source can help them cost-effectively and easily adopt analytic and visualization tools into their big data platform.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>