I am sharing several of the case studies I developed for my courses. This example is a spreadsheet model that forecasts outcomes of an election using data from the 2012 Presidential election.

**Presidential Election Forecasting**

There are a number of mathematical models for predicting who will win the Presidential Election. Many popular forecasting models use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models (like 538) incorporate phenomena such as poll biases, economic data, and momentum. However, even the most sophisticated models are often modeled using spreadsheets.

For this case study, we will look at state-level poll data from the **2012 Presidential election **when Barack Obama ran against Mitt Romney. The spreadsheet contains realistic polling numbers from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes. There are 538 electoral votes: whoever gets 270 or more votes wins.

Assumptions:

- Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
- The proportion of votes that go to a candidate is
__normally distributed__according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012. - Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
- The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.

It is well known that the polls are biased, and that these biases are correlated. This means that there is dependence between state outcomes (lifting assumption #4 above). Let’s assume four of the key swing states have polling bias (Florida, Pennsylvania, Virginia, Wisconsin). A *bias* here means that the poll average for Obama is *too high*. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2%. For example, the mean fraction of votes for Obama in Wisconsin is 52%. This mean would change to 50% – 52% depending on the amount of bias.

Using the spreadsheet, simulate the proportion of votes in each state that are for Obama for these 5 scenarios. Run 200 iterations for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.

Outputs:

- The total number of electoral votes for Obama
- An indicator variable to capture whether Obama won the election.

Tasks:

(1) Create a figure showing the distribution of the total number of electoral votes that go to Obama. Report the probability that he gets 270 or more electoral votes.

(2) Paste the model outputs (the electoral vote average, min, max) and the probability that Obama wins for each of the five bias scenarios.

(3) What is the probability of a tie (exactly 269 votes)?

Modeling questions to think about:

- Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an expected outcome?
- Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
- Why do you think a small bias in 4 states would disproportionately affect the election outcomes?
- How do you think the simplifying assumptions affected the model outputs?
- No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?

More reading from Punk Rock Operations Research:

- Forecasting the Presidential election using regression, simulation, or dynamic programming
- Small changes in the polls can translate to large changes in election day outcomes
- A comparison of the types of data used by Presidential election forecasting models

How FiveThirtyEight’s forecasting model works: https://fivethirtyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-different-because-of-covid-19/

## Files

- The assignment
- A shell spreadsheet with basic data to share with students
- A spreadsheet with the solutions