Tag Archives: elections

November 11, 2022

Locating ballot drop boxes is NP-hard

The state of Michigan passed Proposition 2 on November 8, 2022, a bill that introduces several voting rights including access to drop boxes. Proposition 2 will lead to the widespread use and location of drop boxes in the state of Michigan, since it requires at least one ballot drop box per 15,000 registered voters with at least one drop box per municipality. There will be questions about where to locate the drop boxes, since “the boxes would have to be distributed in an equitable way.”

Operations research can help inform these important election decisions. Dr. Adam Schmidt and I studied issues surrounding the location of drop boxes in our recent paper entitled “Locating ballot drop boxes” (Read the preprint here). Our paper studies how to locate ballot drop boxes when considering multiple criteria such as cost, voter access, and equity. The paper abstract is as follows:

For decades, voting-by-mail and the use of ballot drop boxes has substantially grown, and in response, many election officials have added new drop boxes to their voting infrastructure. However, existing guidance for locating drop boxes is limited. In this paper, we introduce an integer programming model, the drop box location problem (DBLP), to locate drop boxes. The DBLP considers criteria of cost, voter access, and risk. The cost of the drop box system is determined by the fixed cost of adding drop boxes and the operational cost of a collection tour by a bipartisan team who regularly collects ballots from selected locations. The DBLP utilizes covering sets to ensure each voter is in close proximity to a drop box and incorporates a novel measure of access to measure the ability to use multiple voting pathways to vote. The DBLP is shown to be NP-Hard, and we introduce a heuristic to generate a large number of feasible solutions for policy makers to select from a posteriori. Using a real-world case study of Milwaukee, WI, we study the benefit of the DBLP. The results demonstrate that the proposed optimization model identifies drop box locations that perform well across multiple criteria. The results also demonstrate that the trade-off between cost, access, and risk is non-trivial, which supports the use of the proposed optimization-based approach to select drop box locations.

We published an op-ed in The Hill summarizing some of the key findings from this paper.

While I am thrilled to see Michigan introduce a legal requirement for ballot drop boxes in future elections, our research indicates that this requirement is not straightforward for election officials to implement, since decisions involving the location of drop boxes are hard from theoretical and computational perspectives. Tools such as our integer programming model can help election officials make informed decisions.

Dr. Adam Schmidt recently defended his dissertation entitled “Optimization and Simulation Models for the Design of Resilient Election Voting Systems” about election resilience, and his paper about drop boxes is part of his dissertation. He also studied the impact of the COVID-19 pandemic on in-person voting and how to decide how to locate/consolidate polling locations.

I am exciting to see some states expanding the use of ballot drop boxes. Drop boxes have a place in our elections. The US states that are weighing legislation will define how and when drop boxes can be used. With research backed by proven scientific methods using operations research, we can truly make informed decisions about drop boxes and our voting systems.

Leave a comment | tags: elections | posted in Uncategorized

October 27, 2020

Presidential election forecasting: a case study

By Laura Albert

I am sharing several of the case studies I developed for my courses. This example is a spreadsheet model that forecasts outcomes of an election using data from the 2012 Presidential election.

Presidential Election Forecasting

There are a number of mathematical models for predicting who will win the Presidential Election. Many popular forecasting models use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models (like 538) incorporate phenomena such as poll biases, economic data, and momentum. However, even the most sophisticated models are often modeled using spreadsheets.

For this case study, we will look at state-level poll data from the 2012 Presidential election when Barack Obama ran against Mitt Romney. The spreadsheet contains realistic polling numbers from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes. There are 538 electoral votes: whoever gets 270 or more votes wins.

Assumptions:

Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
The proportion of votes that go to a candidate is normally distributed according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012.
Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.

It is well known that the polls are biased, and that these biases are correlated. This means that there is dependence between state outcomes (lifting assumption #4 above). Let’s assume four of the key swing states have polling bias (Florida, Pennsylvania, Virginia, Wisconsin). A bias here means that the poll average for Obama is too high. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2%. For example, the mean fraction of votes for Obama in Wisconsin is 52%. This mean would change to 50% – 52% depending on the amount of bias.

Using the spreadsheet, simulate the proportion of votes in each state that are for Obama for these 5 scenarios. Run 200 iterations for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.

Outputs:

The total number of electoral votes for Obama
An indicator variable to capture whether Obama won the election.

Tasks:

(1) Create a figure showing the distribution of the total number of electoral votes that go to Obama. Report the probability that he gets 270 or more electoral votes.

(2) Paste the model outputs (the electoral vote average, min, max) and the probability that Obama wins for each of the five bias scenarios.

(3) What is the probability of a tie (exactly 269 votes)?

Modeling questions to think about:

Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an expected outcome?
Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
Why do you think a small bias in 4 states would disproportionately affect the election outcomes?
How do you think the simplifying assumptions affected the model outputs?
No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?

More reading from Punk Rock Operations Research:

How FiveThirtyEight’s forecasting model works: https://fivethirtyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-different-because-of-covid-19/

Files

More teaching case studies

Leave a comment | tags: elections, teaching | posted in Teaching materials

September 23, 2020

Resilient voting systems during the COVID-19 pandemic: A discrete event simulation approach

By Laura Albert

Holding a Presidential election during a pandemic is not simple, and election officials are considering new procedures to support elections and minimize COVID-19 transmission risks. I became award of these issues earlier this summer, when I had a fascinating conversation with Professor Barry Burden about queueing, location analysis, and Presidential elections. Professor Burden is a professor of Political Science at the University of Wisconsin-Madison, a founding director of the Elections Research Center, and an election expert.

I was intrigued by the relevance of location analysis and queueing theory in this important and timely problem in public sector critical infrastructure (elections are critical infrastructure). I looked into the issue further with Adam Schmidt, a PhD student in my lab. We created a detailed discrete event simulation model of in-person voting, and we analyzed it using a detailed study.

We present an executive summary of our paper below. Read the full paper here: https://doi.org/10.6084/m9.figshare.12985436.v1

Resilient voting systems during the COVID-19 pandemic:
A discrete event simulation approach

Adam Schmidt and Laura A. Albert
University of Wisconsin-Madison
Industrial and Systems Engineering
1513 University Avenue
Madison, Wisconsin 53706
laura@engr.wisc.edu
September 21, 2020

Executive Summary

The 2020 General Election will occur during a global outbreak of the COVID-19 virus. Planning for an election requires months of preparation to ensure that voting is effective, equitable, accessible, and that the risk from the COVID-19 virus to voters and poll workers is minimal. Preparing for the 2020 General Election is challenging given these multiple objectives and the time required to implement mitigating strategies.

The Spring 2020 Election and Presidential Preference Primary on April 7, 2020 in Wisconsin occurred during the statewide “Stay-at-home” order associated with the COVID-19 pandemic. This election was extraordinarily challenging for election officials, poll workers, and voters. The 2020 Wisconsin Spring Primary experienced a record-setting number of ballots cast by mail, and some polling locations experienced long waiting times caused by consolidated polling locations and longer-than-typical check-in and voting times due to increased social distancing and protective measures. A number of lawsuits followed the 2020 Wisconsin Spring Primary, highlighting the need for more robust planning for the 2020 General Election on November 3, 2020.

This paper studies how to design and operate in-person voting for the 2020 General Election. We consider and evaluate different design alternatives using discrete event simulation, since this methodology captures the key facets of how voters cast their votes and has been widely used in the scientific literature to model voting systems. Through a discrete event simulation analysis, we identify election design principles that are likely to have short wait times, have a low-risk of COVID-19 transmission for voters and poll workers, and can accommodate sanitation procedures and personal protective equipment (PPE).

We analyze a case study based on Milwaukee, Wisconsin data. The analysis considers different election conditions, including different levels of voter turnout, early voting participation, the number of check-in booths, and the polling location capacity to consider a range of operating conditions. Additionally, we evaluate the impact of COVID-19 protective measures on check-in and voting times. We consider several design choices for mitigating the risks of long wait times and the risks of the COVID-19 virus, including consolidating polling locations to a small number of locations, using an National Basketball Association (NBA) arena as an alternative polling location, and implementing a priority queue for voters who are at high-risk for severe illness from COVID-19.

As we look toward the General Election on November 3, 2020, we make the following observations based on the discrete event simulation results that consider a variety of voting conditions using the Milwaukee case study.

Many polling locations may experience unprecedented waiting times, which can be caused by at least one of three main factors: 1) a high turnout for in-person voting on Election Day, 2) not having enough poll workers to staff an adequate number of check-in booths, 3) an increased time spent checking in, marking a ballot, and submitting a ballot due to personal protective equipment (PPE) usage and other protective measures taken to reduce COVID-19 transmission. Any one of these factors is enough to result in long wait times, and as a result, election officials must implement strategies to mitigate all three of these factors.
The amount of time spent inside may be long enough for voters to acquire the COVID-19 virus. The risk to voters and poll workers from COVID-19 can be mitigated by adopting strategies to reduce voter wait times, especially for those who are at increased risk of severe illness from COVID-19, and encourage physical distancing through the placement and spacing of voting booths.
Consolidating polling locations into a few large polling locations offers the potential to use fewer poll workers and decrease average voter wait times. However, the consolidated polling locations likely cannot support the large number of check-in booths required to maintain low voter wait times without creating confusion for voters and interfering with the socially distant placement of check-in and voting booths. As a result, consolidated polling locations require high levels of staffing and could result in long voter wait times.
The NBA has offered the use of its basketball arenas as an alternative polling location for voters to use on Election Day as a resource to mitigate long voter wait times. An NBA arena introduces complexity into the voting process, since all voters have a choice between their standard polling location and the arena. This could create a mismatch between where voters choose to vote and where resources are allocated. As a result, some voters may face long wait times at both locations.

We recommend that entities overseeing elections make the following preparations for the 2020 General Election. Our recommendations have five main elements:

More poll workers are required for the 2020 General Election than for previous presidential elections. Protective measures such as sanitation of voting booths and PPE usage to reduce COVID-19 transmission will lead to slightly longer times for voters to check-in and to fill out ballots, possibly causing unprecedented waiting times at many polling locations if in-person voter turnout on Election Day is high. We recommend having enough poll workers to staff one additional check-in booth per polling location (based on prior presidential elections or based on what election management toolkits recommend), to sanitize voting areas and to manage lines outside of polling locations.
To reduce the transmission of COVID-19 to vulnerable populations during the voting process, election officials should consider the use of a priority queue, where voters who self-identify as being at high-risk for severe illness from COVID-19 (e.g., voters with compromised immune systems) can enter the front of the check-in queue.
In-person voting on Election Day should occur at the standard polling locations instead of at consolidated polling locations. Consolidated polling locations require many check-in booths to ensure short voting queues, and doing so requires high staffing levels. Election officials should ensure that an adequate number of voting booths (based on prior presidential elections or based on what election management toolkits recommend) can be safely located within the voting area at the standard polling locations, placing booths outside if necessary.
We do not recommend using sports arenas as supplementary polling locations for in-person voting on Election Day. Alternative polling locations introduce complexity and could create a mismatch between where voters choose to go and where resources are allocated, potentially leading to longer waiting times for many voters. This drawback can be avoided by instead allocating the would-be resources at the sports arena to the standard polling locations.
The results emphasize the importance of high levels of early voting for preventing long voter queues (i.e., one half to three quarters of all votes being cast early). This can be achieved by expanding in-person early voting, in terms of both the timeframe and locations for early in-person early voting, adding new drop box locations for voters to deposit absentee ballots on or before Election Day, and educating voters on properly completing and submitting a mail-in absentee ballot.

The results are based on a detailed case study using data from Milwaukee, Wisconsin. It is worth noting that the discrete event simulation model reflects standard voting procedures used throughout the country and can be applied to other settings. Since the data from the Milwaukee case study are reflective of many other settings, the results, observations, and recommendations can be applied to voting precincts throughout Wisconsin and in other states that hold in-person voting on Election Day.

2 Comments | tags: elections, politics, queuing, simulation | posted in Uncategorized

November 1, 2018

vote early, vote often

By Laura Albert

Legally, you can only vote once. But if you vote early, you can enable more than one vote to be cast.

Voting in election day is an application of queueing theory. When voter turnout is high, as it is expected to be this year, the queues can become long. Sometimes very long. The lines in Ohio in the 2004 election are infamous. As a result, many voters balked* or reneged** before casting their votes. Alexander S. Belenky and Richard C. Larson ask: Did election queues decide the 2000 and 2004 U.S. presidential elections? Their analysis is summarized in their ORMS Today article.

Practically speaking, queue lines can be reduced one of three ways:

Fewer voters enter the queue
There are more voting booths and people processing voters
Ballots are shorter

Voting early or voting absentee shortens the queues on election day by addressing issue #1. So while you can cast only one vote, casting your vote early means that you can keep the queues shorter on election day and possibly enable someone else to vote who otherwise would not be able to. This is meaningful in practice, since many voters cannot wait in line because of family responsibilities or shift work. So far, 2018 seems to be setting records for early voting. I voted absentee because I will be in Pheonix for the INFORMS Annual Meeting on Election Day.

* Balking: The voter decides not to enter the waiting line.

** Reneging: The voter enters the line but decides to leave before voting.

Queueing on Election Day
In 2016, I wrote about suppressing the vote through bad resource allocation
Waiting is torture but it’s not so bad if there are mirrors or trees

1 Comment | tags: elections | posted in Uncategorized

November 7, 2016

final Presidential election forecast predictions

By Laura Albert

The Presidential election forecasting models I’ve been following this election cycle are all pointing toward a Clinton victory. Now we have to wait and see.

Election Analytics @ Illinois

Clinton 304.5, Trump 233.5 (Expected value) with a >99% win probability for Clinton

Princeton Election Consortium (Sam Wang)

Clinton 311, Trump 227 (Expected value) with a 99% win probability for Clinton

FiveThirtyEight (Nate Silver)

Clinton 296.7, Trump 240.5 (Expected Value) with a 68.5% win probability for Clinton

New York Times Upshot forecast

84% win probability for Clinton

Daily Kos (Drew Linzer)

Clinton 323 electoral votes, Trump 215 with a 88% win probability for Clinton

David Rothschild’s prediction market forecasting model

89-90% win probability for Clinton

Huffington Post Election Forecast

98.1% win probability for Clinton

Sabato’s Crystal Ball

Clinton 322, Trump 216

13 Keys to the White House

Trump to win the popular vote, no electoral college prediction
This model does not use state-level information or the electoral college (see my blog post here). It appears to be a better mid-term or long-term forecasting model.

All models are wrong, some are useful. "13 Keys to the White House" seems useful months (not days) before an election #ElectionFinalThoughts
— Laura Albert (@lauraalbertphd) November 7, 2016

Why don’t all of these models agree? A few articles I’ve read lately about forecasting models and polling:

Leave a comment | tags: elections | posted in Uncategorized

August 3, 2016

13 reasons why Hillary Clinton will (probably) win the Presidential Election

By Laura Albert

There are 96 days until Election Day, but I’m already pretty sure Hillary Clinton will win the election. The Keys to the White House by Allan Lichtman and Vladimir Keilis-Borok is a simple mathematical model that predicts who win a Presidential election. This model predicts who will win months or even years before an election. You can read the writeup in OR/MS Today here. Let’s look at why Hillary will likely win in 96 days.

The model works by considering 13 factors that are equally weighted in the model. The reference point is the person running in the same party as the incumbent President, which is Hillary Clinton in 2016.

1. Party Mandate: After the midterm elections, the incumbent party holds more seats in the U.S. House of Representatives than after the previous midterm elections.
FALSE: 193 Democrats in 112th Congress but 188 in 114th Congress

2. Contest: There is no serious contest for the incumbent party nomination.
FALSE

3. Incumbency: The incumbent party candidate is the sitting president.
FALSE

4. Third party: There is no significant third party or independent campaign.
TRUE (so far!)

5. Short term economy: The economy is not in recession during the election campaign.
TRUE

6. Long term economy: Real per capita economic growth during the term equals or exceeds mean growth during the previous two terms.
TRUE: 1.6% vs. 1.5% and 1.4% Source: http://data.worldbank.org/indicator/NY.GDP.PCAP.KD.ZG

7. Policy change: The incumbent administration effects major changes in national policy.
TRUE

8. Social unrest: There is no sustained social unrest during the term.
TRUE

9. Scandal: The incumbent administration is untainted by major scandal.
TRUE

10. Foreign/military failure: The incumbent administration suffers no major failure in foreign or military affairs.
TRUE

11. Foreign/military success: The incumbent administration achieves a major success in foreign or military affairs.
FALSE

12. Incumbent charisma: The incumbent party candidate is charismatic or a national hero.
FALSE

13. Challenger charisma: The challenging party candidate is not charismatic or a national hero.
TRUE

There are five “Falses.” When five or fewer statements are false, the incumbent party wins. When six or more are false, the challenging party wins. It looks like barring a surge ahead for third party candidate to something like 1992 Ross Perot levels (see #4), five or fewer statements will continue to be false. I’m not sure if the model is flexible to account for a divisive figure like Donald Trump, but we will find out soon.

What is interesting is that this model requires no polling information, which is a major input requirement to most other models (like the one at FiveThirtyEight). It instead looks at underlying causes for support for the political parties based on how satisfied we are with various things that have happened, hence the “keys” about social unrest, war, major policy change, major scandal, and the economy. I blogged before about the importance of the economy in making Presidential election forecasts (“It’s the economy stupid“).

Do you think traditional ways to forecast the election will “work” this year?

10 Comments | tags: elections | posted in Uncategorized

November 4, 2014

how to forecast an election using simulation: a case study for teaching operations research

By Laura Albert

After extensively blogging about the 2012 Presidential election and analytical models used to forecast the election (go here for links to some of these old posts), I decided to create a case study on Presidential election forecasting using polling data. This blog post is about this case study. I originally developed the case study for an undergraduate course on math modeling that used Palisade Decision Tools like @RISK. I retooled the spreadsheet for my undergraduate course in simulation in Spring 2014 to not rely on @RISK. All materials available in the Files tab.

The basic idea is that there are a number of mathematical models for predicting who will win the Presidential Election. The most accurate (and the most popular) use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models like Nate Silver’s 538 model incorporate things such as poll biases, economic data, and momentum. I wanted to incorporate poll biases.

For this case study, we will look at state-level poll data from the 2012 Presidential election. The spreadsheet contains realistic polling data from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes. There are 538 electoral votes: whoever gets 270 or more votes wins.

Assumptions:

Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
The proportion of votes that go to a candidate is normally distributed according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012.
Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.

There is some concern that the polls are biased in four of the key swing states (Florida, Pennsylvania, Virginia, Wisconsin). A bias means that the poll average for Obama is too high. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2% and implement (all states affected by the same bias level at the same time). For example, the mean for Wisconsin is 52%. This mean would be 50% – 52% depending on the amount of bias. Side note: Obama was such an overwhelming favorite that it only makes sense to look at biases that work in his favor.

It is very difficult to find polls that are unbiased. Nate Silver of FiveThirtyEight wrote about this issue in “Registered voter polls will (usually) overrate Democrats): http://fivethirtyeight.com/features/registered-voter-polls-will-usually-overrate-democrats/

Inputs:

The poll statistics of the mean and standard deviation for each state.
The number of electoral votes for each state.

Outputs:

The total number of electoral votes for Obama
An indicator variable to capture whether Obama won the election.

Tasks:

(1) Using the spreadsheet, simulate the proportion of votes in each state that are for Obama using a spreadsheet for each of the 5 scenarios. Run 200 replications for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.

(2) Paste the model outputs (the average and standard deviation of the number of electoral votes for Obama and the probability that Obama wins) for each of the five bias scenarios into a table.

(3) What is the probability of a tie (exactly 269 votes)?

Modeling questions to think about:

Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an unexpected outcome?
Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
Why do you think a tiny bias in 4 states would disproportionately affect the election outcomes?
How do you think the simplifying assumptions affected the model outputs?
No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?

RESULTS

I don’t give the results to my students ahead of time, but here is a figure of the results using @RISK. The students can see how small changes in poll bias can drastically affect the outcomes. With no bias, Obama has a 98.3% chance of winning and with a 2% bias in a mere four swing states, Obama’s chances go down to 79.3%.

@RISK output for the election model. The histogram shows the distribution of electoral votes for the unbiased results. The table below tabulates the results for different levels of bias.

Files.

Here are the instructions, the Excel spreadsheet for Monte Carlo simulation, and the Excel spreadsheet that can be used with @RISK.

election analytics roundup

By Laura Albert

Here are a few election related links:

Eva Regnier at the Naval Postgraduate School has a bootstrapping forecast model for the US Senate election. Her model uses forecasts from Simon Jackson and Drew Linzer. Information about the upcoming election (usually polling results) becomes available over time, and this information produces a sequence of probability forecasts for each race. Eva writes, “I suspect that these probability sequences are not optimal, i.e. they could perform better with respect to single-period probabilistic scoring rules, using the information they have available. I also suspect — this is a stronger claim — that they can be bootstrapped, i.e. that the right function could take the forecast sequence itself and produce a forecast that outscores the original at each update.”
Sheldon Jacobson’s team at the University of Illinois has its Election Analytics site up and running again that is predicting the probability that the Republicans will take the Senate. His method uses Bayesian estimators that use available state poll results.
Of course, Nate Silver has a Senate forecasting model on FiveThirtyEight and provides a nice discussion of how the polling data is used in the forecasting model.
Matt Yglesias: The real problem with Nate Silver’s model is the hazy metaphysics of probability.
Andrew Gelman has a post on poll sample sizes called “Was it really necessary to do a voting experiment on 300,000 people? Maybe 299,999 would’ve been enough? Or 299,998? Or maybe 2000?“
In 1936, the Literary Digest ran a huge and very expensive poll to forecast the Presidential election. They collected about 2.4 million responses in a totally biased sample and predicted that FDR would lose. At the same time, George Gallup accurately forecasted the election with a sample size of only 50,000.

I’ve blogged about elections a lot before. Here are some of my favorites:

Leave a comment | tags: elections | posted in Uncategorized

March 26, 2013

why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament?

By Laura Albert

This blog post is inspired by my disappointing NCAA March Madness bracket. I used math modeling to fill my bracket, and I am currently in the 51st percentile on ESPN. On the upside, all of my Final Four picks are still active so I have a chance to win my pool. I am worried that my bracket has caused me to lose all credibility with those who are skeptical of the value of math modeling. After all, guessing can lead to a better bracket. Isn’t Nate Silver a wizard? How come his bracket isn’t crushing the competition? Here, I will make the case that a so-so bracket is not evidence that the math models are bad. To do so, I will discuss why it is so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament.

Many models for the Presidential election and the basketball tournament are similar in that they use various inputs to predict the probability of an outcome. I have discussed several models for forecasting the Presidential election [Link] and the basketball tournament [Link].

All models that didn’t solely rely on economic indicators chose Obama to be the favorite, and nearly all predicted 48+ of the states correctly. In other words, even a somewhat simplistic model to forecast the Presidential election could predict the correct outcome 96% of the time. I’m not saying that the forecasting models out there were simplistic – but simply going with poll averages gave good estimates of the election outcomes.

The basketball tournament is another matter. Nate Silver has blogged about how models to predict tournament games using similar math models. Here, we can only predict the correct winner 71-73% of the time [Link]:

Since 2003, the team ranked higher in the A.P. preseason poll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

To do well in your bracket, you would need to make small marginal improvements over using the naive model of always picking the better seed (72% success rate). Here, a 96% success rate would be unrealistic — an improved model that would get 75% of the games correctly would give you a big advantage. The big advantage here means that if you used your improved method in 1000 tournaments, it would do better on average than a naive method. In any particular tournament, the improved method may still lead to a poor bracket. It’s a small sample.

The idea here is similar to batting averages in baseball. It is not really possible to notice the difference between a 0.250 batter and a 0.300 batter in a single game or even across the games in a single week. The 0.250 hitter may even have a better batting average in any given week of games. Over the course of the season of 162 games, the differences are quite noticeable when looking at the batters’ batting average. The NCAA does not have the advantage of averaging performance over a large number of games — we are asked to predict a small set of outcomes in a single tournament where things will not have a chance to average out (it’s The Law of Small Numbers).

It’s worth noting that actual brackets get fewer than 72% of the games correct because errors are cumulative. If you put Gonzaga in the Elite Eight and they are defeated in the (now) third round and do not make it to the Sweet Sixteen, then one wrong game prediction leads to two wrong games in the bracket.

It’s also worth noting that some games are easier to predict than others. In the (now) second round (what most of us think of as the first round), no 1 seed has ever lost to a 16 seed, and 2 seeds have only rarely lost to 15 seeds (it’s happened 7 times). Likewise, some states are easy to predict in Presidential elections (e.g., California and Oklahoma). The difference is that there are few easy to predict games in the tournament whereas there are many easy to predict states in a Presidential election. Politico lists 9 swing states for the 2012 election. That is, one could predict the outcome in 82% of the states with a high degree of confidence by using common sense. In contrast, one can confidently predict ~12% of tournament games in the round of 64 teams using common sense (based on four of the games corresponding to 1 seeds). Therefore, I would argue that there is more parity in college basketball than there is in politics.

How is your bracket doing?

10 Comments | tags: elections, march madness, science literacy | posted in Uncategorized

December 18, 2012

the exit polling supply chain

By Laura Albert

A WSJ Washington Wire blog post describes the Presidential election exit polling supply chain in New York in the immediate aftermath of Hurricane Sandy. The Washington Wire blog post highlights the polling firm Edison Research, based in New Jersey. Edison provided the questionnaires used by pollsters who would collect information about the ballots cast. As you might recall, New Jersey and New York were extremely damaged from the hurricane.

Questionnaires

One of the logistical challenges was in printing and delivering the questionnaires used by pollsters around the country. The questionnaires need to be timely, so they are usually shipped one week before the election. Sandy was on track to strike 8 days before the election, so a rush order was placed with the printer. Two thirds of the questionnaires were mailed before Sandy struck and Edison’s election office lost power along with the rest of New Jersey. The rest of the questionnaires were stored for two days until they had to be shipped. Edison printed the mailing labels from their main office, and then UPS shipped the 400 packages to pollsters via Newark Airport. While Edison had redundancy in their system (e.g., the mailing labels could be printed in another facility and a redundant system alerted employees of the change), it only worked because not all of their offices lost power.

Mail Delivery

While Edison relied on UPS to deliver the mail, it is worth noting that USPS mail service continued as normal except for one day during Hurricane Sandy (HT to @EllieAsksWhy).

Gas

Edison relied on having employees implement Plan B. With the gas shortage, it was difficult for employees to get to work when they needed to save gas for other car trips. Organizing car pools was more difficult than normal, since employees could not rely on communicating by email or cell phone.

Hotels

As I mentioned in an earlier post, there were few/no vacancies at hotels that had power, which provided challenges for Edison employees who wanted to work out of a hotel (most offices and homes were without power) or pollsters who needed to travel to different cities to perform exit polling. I’m not sure how these issues were resolved.

Local transportation to the polls

The NYC public transportation was up and running on election day, so the pollsters could make it there for the big day. The subway reopened with limited runs the Thursday before Election Day and was running as usual on Election Day.

What if Hurricane Sandy came later?

Edison Research managed, but having an 8 day head start was helpful for successfully completing a contingency plan. If the hurricane hit 5 days or closer, the questionnaires would have already been printed and mailed. However, there may have been more challenges with getting pollsters to the polling locations in New York City and other locations (the subway may still have been closed on Election Day).

Leave a comment | tags: elections, public policy | posted in Uncategorized

Punk Rock Operations Research

Tag Archives: elections

Locating ballot drop boxes is NP-hard

Presidential election forecasting: a case study

Presidential Election Forecasting

Files

More teaching case studies

Resilient voting systems during the COVID-19 pandemic: A discrete event simulation approach

Resilient voting systems during the COVID-19 pandemic:
A discrete event simulation approach

Executive Summary

vote early, vote often

final Presidential election forecast predictions

13 reasons why Hillary Clinton will (probably) win the Presidential Election

how to forecast an election using simulation: a case study for teaching operations research

Files.

More reading:

election analytics roundup

why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament?

the exit polling supply chain

Follow Punk Rock OR via email

Search Punk Rock OR:

Punk Rock OR Tweets

Recent Comments

Tags