# election day blog post: moving from polls to forecasts

I will have a series of posts on elections and voting leading up to Election Day (hint: check back over the weekend!). This blog post is about the nuts and bolts of forecasting using polls and about interpreting the results.

Many of the polls are a dead heat. Why doesn’t this mean that Obama has a 50-50 chance of being reelected?

This excellent piece by Simply Statistics explains why a candidate’s small lead (say, 0.5%) in the popular vote could translate into a favorite when it comes to winning the election (say, 68%). Moreover, this small lead could translate into many electoral votes. The post was written about Nate Silver’s model on fivethirtyeight, and it is aimed at those that are less statistically literate. It gives the rest of us a good way to explain how forecasting models work, and why a candidate could get so many electoral votes in a close election:

Let’s pretend, just to make the example really simple, that if Obama gets greater than 50% of the vote, he will win the election… [W]e want to know what is the “percent chance” Obama will win, taking into account what we know. So let’s run a bunch of “simulated elections” where on average Obama gets 50.5% of the vote, but there is variability because we don’t have the exact number. Since we have a bunch of polls and we averaged them, we can get an estimate for how variable the 50.5% number is… We can run 1,000 simulated elections… When I run the code, I get an Obama win 68% of the time (Obama gets greater than 50% of the vote). But if you run it again that number will vary a little, since we simulated elections. The interesting thing is that even though we only estimate that Obama leads by about 0.5%, he wins 68% of the simulated elections. The reason is that we are pretty confident in that number, with our standard deviation being so low (1%). But that doesn’t mean that Obama will win 68% of the vote in any of the elections!

This is another way to explain forecasting models from an article on Politico (the Simply Statistics blog post criticizes this article, but I like the quote from Nate Silver):

Silver cautions against confusing prediction with prophecy. “If the Giants lead the Redskins 24-21 in the fourth quarter, it’s a close game that either team could win. But it’s also not a “toss-up”: The Giants are favored. It’s the same principle here: Obama is ahead in the polling averages in states like Ohio that would suffice for him to win the Electoral College. Hence, he’s the favorite,” Silver said.

Nuts and bolts of moving from polling data to forecasting models

This Huffington Post article by Simon Jackman of Stanford describes the process of moving from poll averages to forecasts. The entire article is worth a careful read–it’s written more at the level of a technical reader. He steps through the importance of polling biases, meaning that the polls collectively under- or over-estimate Obama support. He accounts for this by looking at data about historic bias, while counting more recent elections more than more distant elections. This makes sense, because recent elections with polls via land land, text, etc., likely have more insight. This gives him a probability distribution for the polling bias instead of a fixed value of 0.

I adopt an approach that concedes that we simply don’t know with certainty what the error of the poll average will be; I use a (heavy-tailed) probability distribution to characterize my uncertainty over the error of the poll average. Analysis of the temporally discounted, historical data supplies information as to the shape and location of that distribution. … Our problem is that we don’t know what type of election we’ve got (at least not yet). We should buy some insurance when we move from poll averaging to talking about state-level predictions, with some uncertainty coming into our predictions via uncertainty over the bias of the poll average we might encounter this cycle.

We see the effect of this in the table below, which shows that the polls that average out to Obama’s chance of winning different states varies from 11%-97%. When the bias for the polling average is taken into account, Obama’s chances move closer to a 50-50 outcome. On a national level, this means that Obama’s chance of winning the election (“Electoral Vote”) lowers to 71% from an overly-optimistic 99% when bias is not properly accounted for.

The probability that Obama will win each state when assuming a zero bias (left: “poll average”) vs. a probability distribution for the bias (right: “extra uncertainty”)

Another way to look at this

Andrew Gelman of the Statistical Modeling, Causal Inference, and Social Science blog writes a nice post in the NY Times about how the race is really too close to call. Nate Silver’s model, for example, shows that Obama has a 72% chance of winning. This means that we shouldn’t be too surprised if either Obama or Romney wins.

Let’s dig and see what this means. If we ran the election 100 times, Silver was saying that Obama would win 72 of them — but we’ll only be running it once. Silver was predicting an approximate 50.3 percent of the two-party vote share for Obama, but shifts of as large as 1 percent of the vote could happen at any time. (Ultimately, of course, we care about the Electoral College, not the popular vote. But a lot of research on polls and elections has shown that opinion swings and vote swings tend to be national.)

The online betting service Intrade gives Obama a 62 percent chance of winning, and I respect this number too, as it reflects the opinions of people who are willing to put money on the line. The difference between 63 percent and 75 percent may sound like a lot, but it corresponds to something like a difference of half a percentage point in Obama’s forecast vote share. Put differently, a change in 0.5 percent in the forecast of Obama’s vote share corresponds to a change in a bit more than 10 percent in his probability of winning. Either way, the uncertainty is larger than the best guess at the vote margin.

Let me be clear: I’m not averse to making a strong prediction, when this is warranted by the data. For example, in Feburary 2010, I wrote that “the Democrats are gonna get hammered” in the upcoming congressional elections, as indeed they were. My statement was based on [models that suggested] that the Republicans would win by 8 percentage points (54 percent to 46 percent). That’s the basis of an unambiguous forecast. 50.5 percent to 49.5 percent? Not so much. The voters are decided; small events could swing the election one way or another, and until we actually count the votes, we won’t know how far off the polls are. Over the past couple of weeks, each new poll has provided lots of excitement (thanks, Gallup) but essentially zero information.

Not all election forecasting models rely on opinion polls. That will be the subject of another blog post. Stay tuned.