The open source R programming language is the most popular statistical software in use today. It’s used by more than 2 million data scientists and statisticians worldwide, and usage continues to grow rapidly. Given that R programmers command premium salaries (according to surveys by Dice.com and O’Reilly), it’s clear that much of that growth is coming from adoption of R for business applications.
Social media companies were among the first to recognize the value of mining their rich user behavior databases to understand the needs of their users and enhance their platforms with new data-driven features. Facebook, which processes more than 500 terabytes of data a day, uses R to understand how its users interact with the service. Exploratory data analysis helps Facebook understand what its users are doing throughout the day and how viral memes propagate through the social network. Data visualization is a big part of this work, and Facebook has shared its best practices in an online Udacity course, and even used a chart created with R in its IPO prospectus.
Data analysis has also become increasingly important in media, where the availability of public data sources has given rise to the practice of data journalism. The New York Times has been a pioneer in this area, using R as the basis for interactive data analysis features that forecast upcoming elections and that can even identify your birthplace based on your dialect. The Times regularly uses R to enhance its traditional reporting as well, in articles ranging from wealth distribution in the United States to baseball’s greatest pitchers. R’s rapid prototyping capabilities mean that data journalists can go from a concept, to a graphic, to a complete illustration in just hours — essential for rapid analysis of breaking news.
A fast-growing industry for R is marketing analytics. As retailers collect more detailed data about customer buying habits, preferences, and backgrounds, marketing analytics companies have sprung up to help companies make sense of these rich new sources of information. DataSong uses a statistical technique called time-to-event analysis to help retailers like Williams Sonoma understand the marketing events (like advertisements, catalogs, or emails) that led a customer to make a purchase. Similarly, X+1 analyzes terabytes of data to give companies like JP Morgan Chase and Verizon real-time analysis of customer behavior to optimize marketing efforts.
The finance and insurance industries have always been leading users of advanced statistical analysis, so it’s no surprise that R is in widespread use to develop new trading, pricing, and optimization strategies to increase returns and minimize risk. American Century Investmentsuses R to analyze a “social network” of companies, in which financial relationships are used in place of friendships. (Understanding how the performance of suppliers ultimately affects those of downstream manufacturers allows them to optimize their financial investment portfolios.) On the retail banking side, ANZ Bankuses R to estimate the risk associated with home mortgages. Estimating risk is of critical importance in the insurance industry as well, and Lloyds of Londonuses R to model the potential costs associated with catastrophes like hurricanes and earthquakes.
It’s not only big businesses that are using R. The programming language also is used to improve the lives of vulnerable people and for the general public good. The National Weather Service uses R to predict river levels and issue flood alerts, and Realclimate.org uses R to visualize the effects of climate change, such as the recent declines in Arctic sea ice. And in volatile regions, like Syria, the Human Rights Data Analysis Group uses R to get better estimates of war casualties from incomplete information.
These are just a few examples of the organizations that are using R on a daily basis, and the number grows daily.
One consequence of the big data revolution is that companies in every industry now recognize that the key to success is being able to collect, analyze, and act on data better and faster than their competitors. This is now a strategic initiative within competitive organizations, and companies are rapidly hiring new data scientists. R enables these data scientists to analyze data more quickly and more powerfully than other software, which explains its rapid growth across industries.
David Smith is Chief Community Officer at Revolution Analytics, the leading commercial provider of software and services based on the open source R project for statistical computing. With a background in data science, he writes daily about applications of R and predictive analytics at the Revolutions blog (blog.revolutionanalytics.com), and was named a top 10 influencer on the topic of Big Data by Forbes. Follow David on Twitter as @revodavid.