After extensively blogging about the 2012 Presidential election and analytical models used to forecast the election (go here for links to some of these old posts), I decided to create a case study on Presidential election forecasting using polling data. This blog post is about this case study. I originally developed the case study for an undergraduate course on math modeling that used Palisade Decision Tools like @RISK. I retooled the spreadsheet for my undergraduate course in simulation in Spring 2014 to not rely on @RISK. All materials available in the Files tab.
The basic idea is that there are a number of mathematical models for predicting who will win the Presidential Election. The most accurate (and the most popular) use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models like Nate Silver’s 538 model incorporate things such as poll biases, economic data, and momentum. I wanted to incorporate poll biases.
For this case study, we will look at state-level poll data from the 2012 Presidential election. The spreadsheet contains realistic polling data from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes. There are 538 electoral votes: whoever gets 270 or more votes wins.
- Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
- The proportion of votes that go to a candidate is normally distributed according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012.
- Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
- The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.
There is some concern that the polls are biased in four of the key swing states (Florida, Pennsylvania, Virginia, Wisconsin). A bias means that the poll average for Obama is too high. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2% and implement (all states affected by the same bias level at the same time). For example, the mean for Wisconsin is 52%. This mean would be 50% – 52% depending on the amount of bias. Side note: Obama was such an overwhelming favorite that it only makes sense to look at biases that work in his favor.
It is very difficult to find polls that are unbiased. Nate Silver of FiveThirtyEight wrote about this issue in “Registered voter polls will (usually) overrate Democrats): http://fivethirtyeight.com/features/registered-voter-polls-will-usually-overrate-democrats/
- The poll statistics of the mean and standard deviation for each state.
- The number of electoral votes for each state.
- The total number of electoral votes for Obama
- An indicator variable to capture whether Obama won the election.
(1) Using the spreadsheet, simulate the proportion of votes in each state that are for Obama using a spreadsheet for each of the 5 scenarios. Run 200 replications for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.
(2) Paste the model outputs (the average and standard deviation of the number of electoral votes for Obama and the probability that Obama wins) for each of the five bias scenarios into a table.
(3) What is the probability of a tie (exactly 269 votes)?
Modeling questions to think about:
- Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an unexpected outcome?
- Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
- Why do you think a tiny bias in 4 states would disproportionately affect the election outcomes?
- How do you think the simplifying assumptions affected the model outputs?
- No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?
I don’t give the results to my students ahead of time, but here is a figure of the results using @RISK. The students can see how small changes in poll bias can drastically affect the outcomes. With no bias, Obama has a 98.3% chance of winning and with a 2% bias in a mere four swing states, Obama’s chances go down to 79.3%.
- Forecasting the Presidential election using regression, simulation, or dynamic programming
- Small changes in the polls can translate to large changes in election day outcomes
- A comparison of the types of data used by Presidential election forecasting models