The Obama for America campaign was about facing off against Mitt Romney for the White House. It was about the U.S. economy and jobs, taxes and the national debt, America’s standing in the world and immigration. But behind the scenes, the Obama campaign was about creating an analytics culture so that everyone—from tens of thousands of field workers to more than 100 data analytics experts—collected data, measured outcomes and refined marketing, communications and fundraising programs to achieve results.
The demand to measure everything in the $1 billion campaign—“to put an analytics team inside [headquarters] to study us the entire time to make sure we were being smart about things”—came from campaign manager Jim Messina when he began to build the team in April 2011, said Chris Wegrzyn, a technology leader for the Obama for America analytics team. Wegrzyn said it fell to him and his colleagues to figure out how to meet this demand and to build the team and processes to do it.
“The most fundamental thing we did was to create an analysts organization within the campaign,” said Wegrzyn, who is now director of data architecture at the Democratic National Committee. He was speaking during a January 24 webcast sponsored by HP Vertica.
While a lot has been written about the role data and analytics played in the 2012 race, it will still take time for political scientists, historians and business school case study authors to tease out all the takeaways. And for his part, Wegrzyn was not forthcoming about every detail. For example, he declined to cite specifics about algorithms his team used to target specific voter groups, signaling that the tools represent an advantage for his team in his new job.
What Wegryzn did share was an archetypical story of an analytics-driven organization that aligned people, business processes and technologies around a clear mission.
Advancing Analytics Ideas Developed in 2008
A presidential campaign has the luxury of focusing on one goal: getting a majority of votes on Election Day. Wegrzyn said the campaign set up five groups—the field organization along with teams for digital, communications, media relations and finance—to focus on three activities: registration to increase the pool of eligible voters, persuasion to win support, and voter turnout.
Over the past decade, Wegrzyn said, analytics has influenced each of these activities in different ways. The 2008 campaign saw a fragmented approach, with separate databases for voter contact lists, field volunteers, email campaigns and fundraising. A field organization might report data on voters contacted and then analysts can use metrics to track progress, and some modeling to predict voter turnout. Digital activities, like Web promotions or email campaigns could deploy A/B testing to experiment with different approaches.
These were independent silos of activities, but the campaign still was able to assemble data from the field and use it to create predictive models to target voters based on two aspects of their anticipated behavior: whether they are likely to vote and whether they supported Barack Obama or Sen. John McCain for president.
At the start of 2011, Wegrzyn said, his team had to figure out how to inject this kind of analysis into all activities of the campaign, not just getting out the vote, but to create systems that enabled “voter-to-voter” interactions. The challenges included delivering results on an ongoing basis for an expanded set of analytics programs, using data that was widely dispersed as of varying quality. The field organization had a voter contact database. There were separate systems for donors, for online fundraising and email campaigns. The Democratic National Committee had its own database. And the campaign had questions about whether media vendors could provide data about their audiences that the campaign sought.
“We wanted to do social media and other things better” than in 2008, Wegrzyn said. “We were doing some holistic modeling, and experimentation, and trying to figure out the right way [of doing things] in terms of real, hard results. We hadn’t done it before, no one had in the campaign world. But we had a little bit of experience.”
Creating a Work Environment for Big Data
Wegrzyn and the Obama for America data science team had two significant assets before it started on the 2012 campaign: the trust of leadership, who saw results in 2008 and embraced analytics as central to the effort. And access to analytics talent.
Even though the campaign “paid well below market rates,” Wegrzyn said, analytics professionals from a range of backgrounds, from industry and academia, applied for jobs. The goal he said was to “create an environment for smart people to freely pursue their ideas.” Some team members were engineers, and not everyone knew about programming SQL queries, but everyone selected was a problem-solver and fast-learner. (Everyone who did well in interviews also took a test to show how they approached problem-solving.)
To create a work environment for the analysts required adopting big data techniques, Wegrzyn said. While the campaign had less than 10 terabytes of raw data, the analysts would end up generating many times that amount in their experiments. There were many datasets to manage, including new sources—and the pace of the work meant they did not have time to do ETL processes. And the campaign needed to keep pace with the analysts’ ideas.
In six months of evaluations, Wegrzyn said his team chose HP Vertica as its analytics platform after rejecting Hadoop-based systems for most use cases (high learning curve and long development lead time) and appliances (the team needed performance before it needed massive storage). Vertica provided a massively parallel processing database with familiar SQL queries and a path to scalability as analysts continued to pound on the system, he said.
The resulting environment allowed analysts and engineers to use a common platform. Users manipulated and analyzed data using the R and Stata statistical programming languages, as well as SQL queries and third-party analytics programs, Wegrzyn said. (Analysts also used data mining tools from KXEN.)
Two Innovative Analytics Apps
This led to what he called an unexpected innovation, a project called AirWolf, which helped the campaign create a connection between volunteer field workers and digital marketing efforts. By correlating email addresses with voter information, the campaign could connect volunteers in the field with voters. The application used SQL queries against the Vertica database, a data reporting tool the campaign developed called Stork to send custom email via Amazon’s Simple Email Service. Analysts also could analyze results.
The effect was a more personalized politics.
“When a volunteer knocked on the door, and [the voter] said, ‘I am not sure I support the president. I would like to hear about health care law.’ They would enter that into contact database,” Wegrzyn said. The campaign created a means for the volunteer to follow up by email with the undecided voter, to send a personalized message with information about health care policies, and an invitation to discuss the issue further. “Those were brand new for the campaign and exciting for us,” he said.
Another success came with an application that optimized advertising purchases. Wegrzyn said his team employed predictive models to identify target voters. They received anonymized data from media ratings companies. Combining the two datasets and then adding pricing data for various media outlets allowed the campaign to pick the best programs during which to advertise targeted messages. Analysts ran queries to explore the data to confirm the recommendations made sense.
“We were able from experiments to get a sense at the individual level, of what we termed persuadability” of certain precise voting groups, he said. Correlating these results with ratings of those groups as television audiences directed the team’s attention to optimize certain media channels for advertising.
The result was a precision-targeted advertising onslaught. The campaign made twice as many cable TV advertising purchases as the Romney campaign, Wegrzyn said. The choices sought to maximize coverage of targeted voters, rather than broad demographic groups.
“White women, 20 to 29, is a diverse group. It’s hard to talk about that group in general. We needed to talk about individuals,” Wegrzyn said, describing the campaign’s approach to personalization. “We were about taking that data and looking at it more holistically. There was a lot of experimentation and really pulling all that [analytics] together.”
Michael Goldberg is editor of Data Informed. Email him at firstname.lastname@example.org.