# Tag Archives: elections

## final Presidential election forecast predictions

The Presidential election forecasting models I’ve been following this election cycle are all pointing toward a Clinton victory. Now we have to wait and see.

~

Election Analytics @ Illinois

Princeton Election Consortium (Sam Wang)

FiveThirtyEight (Nate Silver)

New York Times Upshot forecast

Daily Kos (Drew Linzer)

David Rothschild’s prediction market forecasting model

Huffington Post Election Forecast

Sabato’s Crystal Ball

13 Keys to the White House

Why don’t all of these models agree? A few articles I’ve read lately about forecasting models and polling:

## 13 reasons why Hillary Clinton will (probably) win the Presidential Election

There are 96 days until Election Day, but I’m already pretty sure Hillary Clinton will win the election. The Keys to the White House by Allan Lichtman and Vladimir Keilis-Borok is a simple mathematical model that predicts who win a Presidential election. This model predicts who will win months or even years before an election. You can read the writeup in OR/MS Today here. Let’s look at why Hillary will likely win in 96 days.

The model works by considering 13 factors that are equally weighted in the model. The reference point is the person running in the same party as the incumbent President, which is Hillary Clinton in 2016.

1. Party Mandate: After the midterm elections, the incumbent party holds more seats in the U.S. House of Representatives than after the previous midterm elections.
FALSE: 193 Democrats in 112th Congress but 188 in 114th Congress

2. Contest: There is no serious contest for the incumbent party nomination.
FALSE

3. Incumbency: The incumbent party candidate is the sitting president.
FALSE

4. Third party: There is no significant third party or independent campaign.
TRUE (so far!)

5. Short term economy: The economy is not in recession during the election campaign.
TRUE

6. Long term economy: Real per capita economic growth during the term equals or exceeds mean growth during the previous two terms.
TRUE: 1.6% vs. 1.5% and 1.4% Source: http://data.worldbank.org/indicator/NY.GDP.PCAP.KD.ZG

7. Policy change: The incumbent administration effects major changes in national policy.
TRUE

8. Social unrest: There is no sustained social unrest during the term.
TRUE

9. Scandal: The incumbent administration is untainted by major scandal.
TRUE

10. Foreign/military failure: The incumbent administration suffers no major failure in foreign or military affairs.
TRUE

11. Foreign/military success: The incumbent administration achieves a major success in foreign or military affairs.
FALSE

12. Incumbent charisma: The incumbent party candidate is charismatic or a national hero.
FALSE

13. Challenger charisma: The challenging party candidate is not charismatic or a national hero.
TRUE

There are five “Falses.” When five or fewer statements are false, the incumbent party wins. When six or more are false, the challenging party wins. It looks like barring a surge ahead for third party candidate to something like 1992 Ross Perot levels (see #4), five or fewer statements will continue to be false. I’m not sure if the model is flexible to account for a divisive figure like Donald Trump, but we will find out soon.

What is interesting is that this model requires no polling information, which is a major input requirement to most other models (like the one at FiveThirtyEight). It instead looks at underlying causes for support for the political parties based on how satisfied we are with various things that have happened, hence the “keys” about social unrest, war, major policy change, major scandal, and the economy. I blogged before about the importance of the economy in making Presidential election forecasts (“It’s the economy stupid“).

Do you think traditional ways to forecast the election will “work” this year?

## how to forecast an election using simulation: a case study for teaching operations research

After extensively blogging about the 2012 Presidential election and analytical models used to forecast the election (go here for links to some of these old posts), I decided to create a case study on Presidential election forecasting using polling data. This blog post is about this case study. I originally developed the case study for an undergraduate course on math modeling that used Palisade Decision Tools like @RISK. I retooled the spreadsheet for my undergraduate course in simulation in Spring 2014 to not rely on @RISK. All materials available in the Files tab.

The basic idea is that there are a number of mathematical models for predicting who will win the Presidential Election. The most accurate (and the most popular) use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models like Nate Silver’s 538 model incorporate things such as poll biases, economic data, and momentum. I wanted to incorporate poll biases.

For this case study, we will look at state-level poll data from the 2012 Presidential election. The spreadsheet contains realistic polling data from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes.  There are 538 electoral votes: whoever gets 270 or more votes wins.

Assumptions:

1. Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
2. The proportion of votes that go to a candidate is normally distributed according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012.
3. Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
4. The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.

There is some concern that the polls are biased in four of the key swing states (Florida, Pennsylvania, Virginia, Wisconsin). A bias means that the poll average for Obama is too high. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2% and implement (all states affected by the same bias level at the same time). For example, the mean for Wisconsin is 52%. This mean would be 50% – 52% depending on the amount of bias. Side note: Obama was such an overwhelming favorite that it only makes sense to look at biases that work in his favor.

It is very difficult to find polls that are unbiased. Nate Silver of FiveThirtyEight wrote about this issue in “Registered voter polls will (usually) overrate Democrats): http://fivethirtyeight.com/features/registered-voter-polls-will-usually-overrate-democrats/

Inputs:

1. The poll statistics of the mean and standard deviation for each state.
2. The number of electoral votes for each state.

Outputs:

1. The total number of electoral votes for Obama
2. An indicator variable to capture whether Obama won the election.

(1) Using the spreadsheet, simulate the proportion of votes in each state that are for Obama using a spreadsheet for each of the 5 scenarios. Run 200 replications for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.

(2) Paste the model outputs (the average and standard deviation of the number of electoral votes for Obama and the probability that Obama wins) for each of the five bias scenarios into a table.

(3) What is the probability of a tie (exactly 269 votes)?

1. Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an unexpected outcome?
2. Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
3. Why do you think a tiny bias in 4 states would disproportionately affect the election outcomes?
4. How do you think the simplifying assumptions affected the model outputs?
5. No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?

RESULTS

I don’t give the results to my students ahead of time, but here is a figure of the results using @RISK. The students can see how small changes in poll bias can drastically affect the outcomes. With no bias, Obama has a 98.3% chance of winning and with a 2% bias in a mere four swing states, Obama’s chances go down to 79.3%.

@RISK output for the election model. The histogram shows the distribution of electoral votes for the unbiased results. The table below tabulates the results for different levels of bias.

### Files.

Here are the instructions, the Excel spreadsheet for Monte Carlo simulation, and the Excel spreadsheet that can be used with @RISK.

## election analytics roundup

Here are a few election related links:

I’ve blogged about elections a lot before. Here are some of my favorites:

## why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament?

This blog post is inspired by my disappointing NCAA March Madness bracket. I used math modeling to fill my bracket, and I am currently in the 51st percentile on ESPN. On the upside, all of my Final Four picks are still active so I have a chance to win my pool. I am worried that my bracket has caused me to lose all credibility with those who are skeptical of the value of math modeling. After all, guessing can lead to a better bracket. Isn’t Nate Silver a wizard? How come his bracket isn’t crushing the competition? Here, I will make the case that a so-so bracket is not evidence that the math models are bad. To do so, I will discuss why it is so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament.

Many models for the Presidential election and the basketball tournament are similar in that they use various inputs to predict the probability of an outcome. I have discussed several models for forecasting the Presidential election [Link] and the basketball tournament [Link].

All models that didn’t solely rely on economic indicators chose Obama to be the favorite, and nearly all predicted 48+ of the states correctly. In other words, even a somewhat simplistic model to forecast the Presidential election could predict the correct outcome 96% of the time. I’m not saying that the forecasting models out there were simplistic – but simply going with poll averages gave good estimates of the election outcomes.

The basketball tournament is another matter. Nate Silver has blogged about how models to predict tournament games using similar math models. Here, we can only predict the correct winner 71-73% of the time [Link]:

Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

To do well in your bracket, you would need to make small marginal improvements over using the naive model of always picking the better seed (72% success rate). Here, a 96% success rate would be unrealistic — an improved model that would get 75% of the games correctly would give you a big advantage. The big advantage here means that if you used your improved method in 1000 tournaments, it would do better on average than a naive method. In any particular tournament,  the improved method may still lead to a poor bracket. It’s a small sample.

The idea here is similar to batting averages in baseball. It is not really possible to notice the difference between a 0.250 batter and a 0.300 batter in a single game or even across the games in a single week. The 0.250 hitter may even have a better batting average in any given week of games. Over the course of the season of 162 games, the differences are quite noticeable when looking at the batters’ batting average. The NCAA does not have the advantage of averaging performance over a large number of games — we are asked to predict a small set of outcomes in a single tournament where things will not have a chance to average out (it’s The Law of Small Numbers).

It’s worth noting that actual brackets get fewer than 72% of the games correct because errors are cumulative. If you put Gonzaga in the Elite Eight and they are defeated in the (now) third round and do not make it to the Sweet Sixteen, then one wrong game prediction leads to two wrong games in the bracket.

It’s also worth noting that some games are easier to predict than others. In the (now) second round (what most of us think of as  the first round), no 1 seed has ever lost to a 16 seed, and 2 seeds have only rarely lost to 15 seeds (it’s happened 7 times). Likewise, some states are easy to predict in Presidential elections (e.g., California and Oklahoma). The difference is that there are few easy to predict games in the tournament whereas there are many easy to predict states in a Presidential election. Politico lists 9 swing states for the 2012 election. That is, one could predict the outcome in 82% of the states with a high degree of confidence by using common sense. In contrast, one can confidently predict ~12% of  tournament games in the round of 64 teams using common sense (based on four of the games corresponding to 1 seeds). Therefore, I would argue that there is more parity in college basketball than there is in politics.

## the exit polling supply chain

A WSJ Washington Wire blog post describes the Presidential election exit polling supply chain in New York in the immediate aftermath of Hurricane Sandy. The Washington Wire blog post highlights the polling firm Edison Research, based in New Jersey. Edison provided the questionnaires used by pollsters who would collect information about the ballots cast. As you might recall, New Jersey and New York were extremely damaged from the hurricane.

Questionnaires

One of the logistical challenges was in printing and delivering the questionnaires used by pollsters around the country. The questionnaires need to be timely, so they are usually shipped one week before the election. Sandy was on track to strike 8 days before the election, so a rush order was placed with the printer. Two thirds of the questionnaires were mailed before Sandy struck and Edison’s election office lost power along with the rest of New Jersey. The rest of the questionnaires were stored for two days until they had to be shipped. Edison printed the mailing labels from their main office, and then UPS shipped the 400 packages to pollsters via Newark Airport. While Edison had redundancy in their system (e.g., the mailing labels could be printed in another facility and a redundant system alerted employees of the change), it only worked because not all of their offices lost power.

Mail Delivery

While Edison relied on UPS to deliver the mail, it is worth noting that USPS mail service continued as normal except for one day during Hurricane Sandy (HT to @EllieAsksWhy).

Gas

Edison relied on having employees implement Plan B. With the gas shortage, it was difficult for employees to get to work when they needed to save gas for other car trips. Organizing car pools was more difficult than normal, since employees could not rely on communicating by email or cell phone.

Hotels

As I mentioned in an earlier post, there were few/no vacancies at hotels that had power, which provided challenges for Edison employees who wanted to work out of a hotel (most offices and homes were without power) or pollsters who needed to travel to different cities to perform exit polling.  I’m not sure how these issues were resolved.

Local transportation to the polls

The NYC public transportation was up and running on election day, so the pollsters could make it there for the big day. The subway reopened with limited runs the Thursday before Election Day and was running as usual on Election Day.

What if Hurricane Sandy came later?

Edison Research managed, but having an 8 day head start was helpful for successfully completing a contingency plan. If the hurricane hit 5 days or closer, the questionnaires would have already been printed and mailed. However, there may have been more challenges with getting pollsters to the polling locations in New York City and other locations (the subway may still have been closed on Election Day).

Related posts:

## queuing on election day

This is another blog post about voting. This one focuses on the actual act of voting in all its queuing glory.

Queue basics: the voters are customers who enter the system. The system here is a voting area for a precinct. The voters wait in a queue to cast their votes in voting booths (the servers). The customer arrival rates depend on the time of day, and hence, the system is not stationary.

Let’s look at different ways to look at voting from a queuing perspective.

Let’s start with this article in the Economist that argues that bad weather favors Romney. Here, they focus on how weather affects the voter arrival rates:

To be brutal, a certain amount of bad weather on election day helps conservatives in every democracy. In crude terms, car-driving conservative retirees still turn out in driving rain, when bus-taking lower-income workers just back from a night shift are more likely to give rain-soaked polls a miss. School closures are a particular problem for low-income families or single mothers scrambling to find childcare.

Thus, bad weather may decrease the arrival rate of liberal voters more than of conservative voters.

Ultimately, many people are going to vote. Long lines were a problem in 2004 and 2008, and a few balked at waiting in line. Many places (such as my state of Virginia) offered early voting via absentee ballots to voters in 2008, since the turnout was unprecedented. Waiting in line to vote leads to questions about voting machine allocations and the time it takes to vote.

Muer Yang, Ted Allen, and Michael Fry wrote a paper that focuses on the number of servers and the service times. [Link to press release] They examine how to assign voting machines to precincts to equalize the amount of waiting time between precincts so that some precincts are not plagued with long waiting times while others are not. They do so by noting that voting is not stationary and include  other realistic voting complications:

“[The election board’s] assumptions of those problems are not even close to the real world,” he added, “because [the election board’s traditional] model assumes a stationary voter arrival — that voters arrive at the voting station at the same rate, which is not true. We use simulation models to consider realistic complications, including variables such as voter arrival time, voter turnout, length of time needed to finish a ballot, peak voting times and machine failures.”

The paper hasn’t been published yet, so I don’t know all of the details. To satiate your desire for mathematical details, you can read this paper in the Winter Simulation Conference by Muer Yang, Michael Fry, and David Kelton. They examines how to allocate voting machines (servers) to voting precincts in an equitable manner.  There are many ways to evaluate equity. Here, the authors use the average absolute differences of expected waiting times among precincts as a proxy for “equity.”  They provide a heuristic that uses a factorial experimental design and show that this heuristic outperforms the “utilization-equalization” method. The  “utilization-equalization” method is another proxy for voter equity that “equalizes the utilization of voting machines rather than equalizing waiting times of voters. Moreover, the utilization rate is obtained by traditional queueing theory, which assumes stationary arrivals and steady-state operating conditions.”

Initially, I thought that so many people voting early via absentee ballot or just early voting would mean fewer long lines in a queue. This is not necessarily the case. Early voting in South Florida and Ohio has been plagued with long lines (up to six hours). Hopefully, this means that fewer people will be in line on Election Day. I haven’t heard yet if state budget cuts will lead to poorly staffed voting precincts, which will in turn lead to long lines on Election Day even if the turnout isn’t record-setting.

All those voter ID laws were supposed to cut down on voter fraud. In a queuing context, that means that the new laws would slightly reduce the voter arrival rates. Carl Bialik wrote a nice article in the WSJ [Link] about voter fraud and whether voter ID laws would make much of a difference. The short answer is that they don’t. Fraud is hard to detect, and when it has been detected, it has most often occurred with absentee ballots (no ID needed to vote absentee) and during voter registration.

How long did you wait in line to vote?