The growing importance of big data in presidential elections is no secret, as it jumped into the spotlight in 2008, when then-Senator Barack Obama invested significant resources into analyzing big data, which helped propel him to the White House.
At the same time, Nate Silver used big data to project Obama’s presidential victory with astounding accuracy, correctly predicting the outcomes in 49 of the 50 states in the 2008 presidential election.
Silver did even better in the 2012 United States presidential election, successfully predicting the winner in all 50 states and the District of Columbia.
Given the growing importance of big data in the presidential election, I thought it only natural that I should consider throwing my hat into the ring and run for president.
I wanted to validate the market sentiment before jumping into the race, so I tapped into a fun R script that pulls Twitter data to build a sentiment analysis on the key presidential candidates. The boxplot in Figure 1 shows the relative sentiment on each key candidate by comparing positive and negative tweets in Twitter.
From this analysis, we can see that Donald Trump has more positive tweets than negative tweets, but not enough to scare me off.
So without any real forethought, I thought I’d throw my hat into the ring. Now, I have no money and I don’t want to fly to any of these state primaries, but ignoring those limitations right now, I was trying to figure out how I would leverage big data to win the presidential election.
Using Big Data to Get Elected President
Step 1: Determine and Prioritize Use Cases. My use cases are my policies and platforms, such as pro renewable energy, pro education, pro Chipotle, and pro Chicago Cubs. That seems like a winning combination.
I want to research what my competitors are saying and their positions, so I downloaded some data from Political TV Ad Archive that I used to understand their key positions and determine the frequency and recency of their messaging (Figure 2).
I see lots of money being spent on campaign advertising, but I suspect that much of that money is being wasted. So again, I am not scared off.
Step 2: Prioritize Areas Of Focus. Some states always seem to vote Democratic or Republican, so I should focus my limited data science resources on profiling, segmenting and targeting the key “swing” states. Those “swing” states are highlighted in Figure 3.
So it looks like my priorities will be Florida (I vacation there every summer), Ohio (I used to live in Cincinnati), Wisconsin (I have lots of friends there), Pennsylvania (I like Rolling Rock beer), Nevada (I give them money every time I visit), and Iowa (my home state).
I perform sentiment analysis of my policies and platform positions within each swing state, leveraging sentiment analysis on each of my key policy and platform positions to determine how my position and platform resonates with voters at a state, county, and ZIP-code level to identify my key battlefields (so I can further prioritize my limited election resources).
Step 3: Build Voter Profiles. For each registered voter in each targeted state, I gather and integrate into a data lake data about that voter from a multitude of public data sources (voter registration, Zillow, LinkedIn, Twitter, Facebook, property information, property taxes, etc.). From these data sources, I would build a voter profile that includes:
- Basic demographics (age, gender, income, education, number and age of dependents) along with psycho-demographic, life stage, and lifestyle data
- Historical propensity to vote Republican, Democratic, or Independent
- Family-member influence on propensity to vote Republican, Democratic, or Independent
- Create a “platform propensity score” on each voter’s likely support for each of my key platforms positions
- Mine social media data to understand and measure individual voter interests, passions, associations, and affiliations
- Create a “Likelihood to Recommend” (LTR) score based upon social media interactions for each voter to see which voters I could approach about campaigning and canvassing for me
- Create association rules that can associate or link my key platform positions to other platform positions that I may need to consider (e.g., people who like Chipotle also like Starbucks)
Finally, I would aggregate all of this data and the different scores to create an over-arching “Vote For Schmarzo” (VFS) propensity score that measures that voter’s likelihood to vote for me.
Step 4: Prioritize Targeted “Swing” Voters. Leveraging the VFS propensity scores and social media data, I want prioritize where and how best to reach my targeted voters. Figure 3 shows an example of the targeting analysis a company like IdealSpot can create that I would leverage to focus my communications channels and messages for key battlegrounds like Cincinnati (Figure 4) and Tampa.
I would leverage the VFS score to prioritize my marketing, messaging, and canvasing efforts on those voters who I need to keep (those at risk of attrition to another candidate) and those that I can swing over to my side (voter cross-sell).
Step 5: Persuading My Swing Voters. Next, I need to develop plans (advertising, speeches, meet-and-greets, messaging, positioning, channels, outreach, canvassing, etc.) to move swing voters into my camp. I would leverage analysis like the Between Cluster Analysis to identify those voters who I think I can swing over to my campaign, and the optimal approach for moving those voters based upon my platform and policies (Figure 5).
Step 6: Listening to and Monitoring My Swing Voters. Finally, I want to monitor my target cities and counties constantly to flag voter changes that may require marketing, communication, and canvasing actions. I could monitor voter sentiment data to fine-tune my messaging and platform to make sure that my campaign is as relevant as possible (while still staying true to my platforms…sorry, I am not selling out on my Cubbies!). I would integrate big data (social media data, voting records) with small data (door-to-door canvassing notes) to ensure that I had a current view of each of my key cities and counties.
With any luck, I will use big data to identify, score, and persuade my key constituents, but also will use big data (along with small data) to listen and monitor voters regarding their sentiments on my platforms and positions. And if all works as planned, I’ll be living in a new house come next January! Plus, it’ll be nice to have someone else do the laundry.
Bill Schmarzo is responsible for setting the strategy and defining the service line offerings and capabilities for the EMC Consulting Enterprise Information Management and Analytics service line. He’s written several white papers and is a frequent speaker on the use of big data and advanced analytics to power organization’s key business initiatives.
Bill has more than two decades of experience in data warehousing, BI, and analytic applications. Bill authored the Business Benefits Analysis methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.
Previously, Bill was the vice president of Analytics at Yahoo, where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of actionable insights through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-leading analytic applications.
Bill holds a master’s degree in Business Administration from the University of Iowa and a bachelor of science degree in Mathematics, Computer Science, and Business Administration from Coe College.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.