forecasting the Presidential election using regression, simulation, or dynamic programming

Almost a year ago, I wrote a post entitled “13 reasons why Obama will be reelected in one year.” This post uses Lichtman’s model for predicting the Presidential election way ahead of time using 13 equally weighted “keys” – macro-level predictors. Now that we are closer to the election, Lichtman’s method offers less insight, since it ignores the specific candidates (well, except for their charisma), the polls, and the specific outcomes from each state. At this point in the election cycle, knowing which way Florida, for example, will fall is important for understanding who will win.  Thus, we need to look at specific state outcomes, since the next President needs to be the one who gets at least 271 electoral votes, not the one who wins the popular vote.

With less than two months until the election, it’s worth discussing two models for forecasting the election:

  1. Nate Silver’s model on fivethirtyeight
  2. Sheldon Jacobson’s model (Election analytics)

In this post, I am going to compare the models and their insights.

Nate Silver [website link]:

Nate Silver’s model develops predictions for each state based on polling data. He adjusts for different state polls applying a “regression analysis that compares the results of different polling firms’ surveys in the same state.” The model then adjusts for “universal factors” such as the economy and state-specific issues, although Silver’s discussion was a bit sketchy here–it appears to be a constructed scale that is used in a regression model. It appears that Silver is using logistic regression based on some of his other models. Here is a brief description of what goes into his models:

The model creates an economic index by combining seven frequently updated economic indicators. These factors include the four major economic components that economists often use to date recessions: job growth (as measured by nonfarm payrolls), personal incomeindustrial production, and consumption. The fifth factor is inflation, as measured by changes in theConsumer Price Index. The sixth and seventh factors are more forward looking: the change in the S&P 500 stock market index, and the consensus forecast of gross domestic product growth over the next two economic quarters, as taken from the median of The Wall Street Journal’s monthly forecasting panel.

Nate Silver’s methodology is here and here. It is worth noting that Silver’s forecasts are for election day.

Sheldon Jacobson and co-authors [website link]

This model also develops predictions for each state based on polling data. Here, Jacobson and his collaborators use Bayesian estimators to estimate the outcomes for each state.  A state’s voting history is used for it’s prior. State polling data (from Real Clear Politics) is used to estimate the posterior. In each poll, there are undecided voters. Five scenarios are used to allocate the undecided voters from a neutral outcomes to strong Republican or Democrat showings. Dynamic programming is used to compute the probability that each candidate would win under the five scenarios for allocating undecided votes. It is worth noting that Jacobson’s method indicates the Presidential election if it is held now; it doesn’t make adjustments for forecasting into the future.

The Jacobson et al. methodology is outlined here and the longer paper is here.

Comparison and contrast:

One of the main differences is that Silver relies on regression whereas Jacobson uses Bayesian estimators. Silver uses polling data as well as external variables (see above) as variables within his model whereas Jacobson relies on polling data and the allocation of undecided voters.

Once models exist for state results, they have to be combined to predict the election outcome. Here, Silver relies on simulation whereas Jacobson relies on dynamic programming. Silver’s simulations appear to simulate his regression models and potentially exogenous factors. Both the simulation and dynamic programming approaches model inter-state interactions that do not appear to be independent.

Another difference is that Silver forecasts the vote on Election Day whereas Jacobson predicts the outcome if the race were held today (although Silver also provides a “now”-cast). To do so, Silver adjusts for post-convention bounces and for the conservative sway that occurs right before the election:

The model is designed such that this economic gravitational pull becomes less as the election draws nearer — until on Election Day itself, the forecast is based solely on the polls and no longer looks at economic factors.

This is interesting, because it implies that Silver double counts the economy (the economy influences voters who are captured by the polls). I’m not suggesting that this is a bad idea, since I blogged about how all forecasting models stress the importance of the economy in Presidential elections. It is worth noting that Silver’s “now”-cast is close to Jacobson’s prediction (98% vs. 100% as of 10/1)

Silver makes several adjustments to his model, not relying solely on poll data. The economic index mentioned earlier is one of these adjustments. Others are the post-convention bounces (those have both been weighed out by now). While Silver appears to do this well, the underlying assumption is that what worked in the past is relevant for the election today.  This is probably a good assumption as long as we don’t go too far in the past. This election seems to have a few “firsts,” which suggests that the distant past may not be the best guide. For example, the economy has been terrible: this is the first time that the incumbent appears to be heading toward reelection under this condition.

Both models rely on good polls for predicting voter turnout. The polls in recent months have been conducted on a “likely voter basis,” From what I’ve read, this is the hardest part of making a prediction. The intuition is that it’s easy to make a poll, but it’s harder to predict how this will translate into votes. Silver explains why this issue is important in response to a CNN poll:

Among registered voters, [Mr. Obama] led Mitt Romney by nine percentage points, with 52 percent of the vote to Mr. Romney’s 43 percent. However, Mr. Obama led by just two percentage points, 49 to 47, when CNN applied its likely voter screen to the survey.

Thus, the race is a lot closer when looking at likely voters. Polling is a complex science, but those who are experts suggest that the race is closer than polls indicate.

Jacobson’s model overwhelmingly predicts that Obama will be reelected, which is in stark contrast to other models that give Romney a 20-30% chance of winning as of 9/16 and a ~15% of winning today (10/1). Jacobson’s model predicted an Obama landslide in 2008, which occurred. The landslide this time around seems to be due to a larger number of “safe” votes for Obama in “blue” states (see the image below). Romney has to win many battleground states to win the election. The odds of Romney winning nearly all of the battleground states necessary to win is ~0% (according to Jacobson as of 9/30). This is quite a bold prediction, but it appears to rely on state polls that are accurately calibrated for voter turnout. To address this, Jacobson uses his five scenarios that suggest that even with a strong conservative showing, Romney has little chance of winning.  Silver and InTrade predict a somewhat closer race, but Obama is still the clear favorite  (e.g., Intrade shows that Romney has a 24.1% of winning as of 10/1) .

Additional reading:

Special thanks to the two political junkies who gave me feedback draft of this blog: Matt Saltzman and my husband Court.

Sheldon Jacobson’s election analytics predictions as of 9/16


4 responses to “forecasting the Presidential election using regression, simulation, or dynamic programming

Leave a comment