Follow a political analyst as she uses GIS to explore the interplay between geography and demographics that led to unexpected election results in 2016. Download her data and a step-by-step tutorial to get hands on practice using regression analysis to uncover spatial relationships.
An analyst at a small political consulting firm has been poring through data and analyses from the 2016 presidential election. She knows her company will need to come up with a powerful new angle if they are going to stay in business. While the demand for political consulting services has certainly exploded, so has the competition.
The analyst is well aware that a lot has been written on who, exactly, voted for Donald Trump. Conclusions, however, are conflicting. There is a general consensus that his support came mainly from an older, whiter, and more working-class population. An article in The Atlantic, early in the primaries, for example, focused on blue collar support for Trump. But there are also articles, including one in The Washington Post written after the general election, that make a counter argument. Why the contradiction?
One election analysis addresses something the analyst has been wondering about herself — did voting patterns vary across the country? Was Trump's support among blue collar workers, for example, strong in some regions and weak in others? What about his lack of support among people with a college education?
If she can show where these geographic variations occur, future campaign strategists will be able to more effectively tailor their messages region by region. With the right messaging they can influence how people vote.
A vast amount of money, in fact, is spent on broad messaging platforms, including television, radio, and print. Micro-targeting voters by geographic location via social media, is also on the rise. Developing the right message for specific demographics and locations is definitely a game changer.
The analyst sees a lot of potential for providing analysis and consulting services to upcoming local and statewide campaigns. If she can develop a solid methodology for the 2018 elections, she should be able to attract new clients at both the local and national level once the 2020 elections start to heat up.
She will begin with an exploratory analysis to see which variables are most consistently correlated with a vote for or against Trump. Her ultimate goal, however, is to not only look at correlations (like everyone else has done), but to also look at spatial variations in those correlations. In addition, she wants to do her analysis at the county level; most analyses she's seen only look at state level data.
She will use Regression Analysis. Her dependent variable (the variable she wants to understand/predict), is the percentage of people in each county who voted for Trump.
This map shows the percentage of votes for Trump, by county, in the continental US.
Click on the map to see other county variables.
She also needs a set of explanatory variables to correlate with the percent Trump vote. She starts with the most obvious variables including income, party affiliation, race/ethnicity, employment, age, population density (to get at urban vs rural populations), gun ownership, educational attainment, and marital status.
She uses a tool called Exploratory Regression. It tries all possible combinations of up to five explanatory variables, looking for properly specified models. Even when it doesn't find a properly specified model (as in this case), the report created by this tool provides valuable information. It includes, for example, a table showing how often each explanatory variable was statistically significant. The key explanatory variables will be statistically significant in close to 100 percent of the models tried.
The table, shown below, also indicates if a variable has a consistently positive or consistently negative relationship to the dependent variable. Notice, for example, that the % Blue Collar variable is significant 100 percent of the time and always has a positive relationship with the percentage of people voting for Trump (100.00). With a positive relationship, both variables go up together, so the larger the percentage of blue collar workers in a county, the larger the percentage of people voting for Trump. The % Asian variable has a consistently negative relationship (100.00). With a negative relationship, as one variable goes up, the other goes down, so the larger the percentage of Asian people, the smaller the percentage of people voting for Trump. This strong relationship is interesting since not much has been written about it. Other interesting relationships are % White with a strong positive relationship and % Gun Owners, also positive. The Population Density variable has a consistently negative relationship suggesting urban areas were less likely to vote for Trump (as density increases, Trump vote percentages decrease).
The analyst is especially interested to see how much the relationship between these variables and the Trump vote changes across the country. She begins by running Geographically Weighted Regression (GWR) with the percentage of votes for Trump as the dependent variable and the percentage of blue collar workers as the explanatory variable. A map of the resultant coefficients shows her the spatial pattern of this correlation. Click on the map below to learn more.
The correlation between Trump votes and blue collar workers was strongest in the dark areas of the map and weakest in the light areas of the map.
The counties highlighted with orange have the largest percentages of blue collar workers.
Messaging designed to resonate with blue collar workers (messages promoting investment in US jobs, for example), will have the biggest impact in counties with high percentages of blue collar workers.
Republicans will connect most strongly in counties where the relationship between blue collar workers and Trump support was strong. Democrats might be more successful in counties where that relationship was weak.
The coefficient values, mapped above, range from -1 to 3. This means in the lightest areas of the map, there was a weak or even negative (-1) correlation between the percentage of blue collar workers and the percentage of votes for Trump. In the darkest areas of the map, there was a strong and positive relationship. A coefficient of 3, for example, indicates that a one percent increase in blue collar workers translated to about a three percent increase in Trump votes.
The variation in the coefficient values across the country helps to explain the contradictory conclusions about blue collar support for Trump. It also provides important clues for where to target messaging. Republicans will find their strongest connections if they focus on the counties with the darkest shading that are also outlined in orange on the map above. Blue collar support for Trump was strong there and the percentage of blue collar workers is large. Democrats will likely make their biggest impacts in the counties with the lightest shading, also outlined in orange. In these counties, blue collar support for Trump was weak but the percentage of blue collar workers is large.
The analyst is ready to look at other variables. At this stage, however, she needs to limit her scope to proof of concept work only. She will examine broad relationships for one additional explanatory variable and then move on to more focused analyses. She decides to look at the relationship between the percentage of people with a college or professional degree and the percentage of people voting for Trump. Again, she runs Geographically Weighted Regression (GWR) and maps the coefficients.
The correlation between the percentage of college graduates and the percentage of people voting for Trump was primarily negative (the darkest colors), but about 8% of the counties (the lightest colors) have a positive coefficient.
The counties highlighted with orange have the largest percentages of people with college or professional degrees. Messaging will be most effective in these counties, and for Democrats, most effective where the relationship was also strongly negative. Constructing messages promoting affordable education, for example, will likely resonate in these counties.
For the darkest areas of the map above, as the percentage of college graduates increases, the percentage of Trump votes decreases. The smallest coefficient (the strongest negative relationship) is -1.7, indicating a one percent increase in college graduates corresponded to a 1.7 percent decrease in votes for Trump. For a few counties (either percent of the total) there is a positive relationship (positive coefficient). The largest positive coefficient is 1.8 indicating a one percent increase in people with a college or professional degree translated to about a 1.8 percent increase in the vote for Trump in those counties.
By looking at one key variable at a time, these national level coefficient maps provide opportunities for broad level messaging, and this is helpful. Next, she will focus in on clusters of battleground counties, starting broadly with those where the difference between Trump and Clinton was less than 20 percentage points (that is, where a candidate won with less than 60 percent of the vote). She will also attempt to find a properly specified regression model. With a properly specified model, she will get a better understanding of which specific variables were important in each region, and this will allow more refined campaign messaging.
The race was closer in the counties shown in red and blue than those shown in white.
Click on the map and compare the vote percentages for some of the red, blue, and white counties. In Nebraska, for example, Trump won some counties with as much as 90% of the vote (white counties).
She notices there is a distinct cluster of battleground counties in Colorado, extending south into New Mexico. This makes sense — Colorado of late has been considered a swing state in national elections, and New Mexico, while generally a blue state, is always a tighter race than the Democrats would like. The race was close for several counties in Arizona as well. This seems like a good region to tackle first.
She will use Exploratory Regression again. This time she will expand her list of explanatory variables, though. She adds Tapestry variables describing different characteristics of the population — not just income and education, but also consumer preferences, hobbies, family structure, urban/rural context, and much more. She knows spatial variables are also important for finding a properly specified model so she creates one reflecting the distance to Los Angeles. Attitudes toward immigration were also important in this election so, in addition, she calculates the distance from each county to the Mexican border.
What is she missing? She reviews the literature again and discovers an article suggesting Trump support was strong in locations with large increases in the Latino population. She creates a variable reflecting this increase or decrease. Finally, looking at a map of the area, she notices there are a large number of Indian Reservations in these states. She adds a variable for the percentage of the population that is Native American.
With the smaller study area and an expanded list of candidate explanatory variables, her chances for finding a properly specified model are much better.
She crosses her fingers and fires up the Exploratory Regression tool. Sure enough, with six explanatory variables, the tool finds three properly specified models! She selects the best model by comparing the adjusted R2 and AICc values. Adjusted R2 is a measure of how much of the variation in the dependent variable (percent Trump vote) is explained by the model (the six explanatory variables). The best model explains 84 percent (so it is telling roughly 84 percent of the Trump vote story). She also looks at AICc. AICc is an indicator of how much of the dependent variable information is lost by the six variable model. You can't interpret the AICc values directly, but you can use them to compare different models as long as they have the same dependent variable. The model with the smallest AICc value is the better model (less information is lost).
Her best model of the percent of votes for Trump in Arizona, Colorado, and New Mexico includes the variables shown in the chart below. She runs this model using Ordinary Least Squares Regession (OLS) in order to see the coefficients. She will create potential messaging for the variables with the largest coefficients (either negative or positive) because they are the strongest predictors.
She will recommend targeted messaging relating to the % Change in Hispanic Population and % College Degree variables.
At the county and state level, targeted messaging can be especially effective.
For the counties experiencing a rapid Hispanic population increase, a Republican campaign might focus on border security to motivate Trump voters who bought into his rhetoric about building a wall between the United States and Mexico.
Democrats might find more fertile ground in counties with a high number of people with college degrees. A message based on relief for student debt and the importance of education, for example, might be effective.
This map shows which counties had the largest increases in the Hispanic population.
To motivate Trump supporters, Republicans may want to target the dark orange counties with messages about increased border security.
This map shows the percentage of people with college or professional degrees.
In the dark orange counties, democrats will likely be effective connecting with voters on issues relating to affordable education.
Since she is familiar with Virginia (having done an internship in Washington, D.C.), she decides to focus on that state next.
She uses the same workflow she used for the states in the Southwest, identifying all possible explanatory variables and running the Exploratory Regression tool. The tool finds one passing model with six variables, and five with seven variables. All of the passing models include the same six explanatory variables: percentage of people voting in one or more elections the past 12 months, Asian population, white population, the Savvy Suburbanite tapestry variable, the Southern Satellites tapestry variable, and seniors. She runs OLS and creates a chart showing the coefficients. The larger the coefficient (either positive or negative), the stronger the relationship with the percent Trump vote.
This time, the variables with the strongest relationship are both negatively correlated with the Trump vote. The strongest variable is the percent of the population in each county that voted within the past twelve months — that is, engaged and reliable voters.
The analyst is a bit perplexed by this relationship. This may be one for the political scientists to sort out. As for the practical implications for campaign strategists, however, messaging might not be tied so much to a policy issue, as to building up the association between a candidate and Trump in an attempt to drive higher voter turnout.
The second strongest relationship is with the percentage of people in each county who identify Asian. A little research uncovers some issues that may be important to this demographic. She learns that immigration from Asian counties to the United States has grown rapidly in recent years.
Perhaps a Democratic message based on more liberal travel and immigration policies would sway at least some of these voters, especially in light of the Trump Administration's plan to cut legal immigration.
A Republican campaign, by contrast, might target this demographic using messages about rolling back affirmative action, which some claim discriminates against Asian students.
This map shows the percentage of people who voted within the past 12 months, reflecting engaged and reliable voters.
In the lightest orange counties, both parties may want to target people who haven't voted regularly. For Republicans, comparing the current candidate to Trump may motivate his supporters. For Democrats, highlighting their candidates record for opposing Trump policies, will likely be effective.
This map shows the percentage of the population identifying Asian.
While this represents a small portion of the population, in Fairfax county alone, there are over 220,000 Asian people. In the 2013 gubernatorial race, Govenor Terry McAuliffe won by just 56,000 votes statewide.
Both parties will want to appeal to this demographic in Virginia and particularly in Fairfax and Loudoun counties.
The next step is to let candidates and campaign organizers know about the new analytic services her firm offers. Spending on campaigns grows dramatically each election cycle. With these new analytical tools, the firm will be well positioned to get its portion of the profits. At least that's how the firm's management views things. As for the analyst, she hopes the methods she's developed will contribute to the art and science of election analysis, and, just maybe, might help candidates and elected officials better understand and address the needs of their constituents.
For more details and additional analyses, see the case study overview. To try your hand using the methods described here, download the data and work through a step-by-step tutorial. Or, use the tutorial as a guide for applying these workflows to your own data.
While the data for the analyses outlined here is real, the context and workflow described here have purposely been selected to demonstrate statistical techniques useful for mapping data relationships. These techniques include ordinary least-squares regression and spatial regression (Geographically Weighted Regression).
An error has occurred |