# no confidence in confidence intervals

With all of the upcoming primaries, I have been reading a little bit about polling data.  Nate Silver of the NY Times discusses how frequently a candidate’s vote total falls in the margin of error (based on poll data) .  Usually, 95% confidence intervals are reported, so you would expect a candidate’s numbers to be outside the confidence interval ~5% of the time.

FiveThirtyEight has a database consisting of thousands of primary and caucus polls dating back to the 1970s. Each poll contains numbers for several candidates, so there are a total of about 17,000 observations. How often does a candidate’s actual vote total fall within the theoretical margin of error?

The answer is, not very often. In theory, a candidate’s actual vote total should fall outside the margin of error only 5 percent of the time. In reality, the candidate’s vote total was outside the margin of error 65 percent of the time! Part of this is because the database includes some polls conducted months before the actual voting took place. But even if you restrict the analysis to polls conducted within the final week of the campaign, about 40 percent of the vote totals fell outside the margin of error — eight times more often than is supposed to happen if you could take the margin of error at face value. [emphasis added]

Silver argues that it is important to recalibrate the polling data based on the accuracy of past polls. To make predictions about election/primary results based on polling data, he (b) adjusts the results based on how recent the polls are (more recent = more accurate), (c) accounts for undecided voters, and (c) accounts for “momentum.” Silver’s methodology can be found here and his prediction for the New Hampshire primary can be found here.

Related post:

#### 4 responses to “no confidence in confidence intervals”

• prubin73

I wonder how Silver (or anyone) can adjust for non-response bias? My guess is that any “undecided” option in the poll is likely to be underutilized, because people who decide at the last minute (or are unenthused about all the candidates) are less likely to bother responding to the poll. It’s a safe bet non-response bias is present, but how to assess it?

• Laura McLay

@prubin73: From the same link, Silver has two methods for adjusting for undecideds:

“Forecasts can also be improved by assigning undecided voters to the candidates. The specific method used represents a compromise between dividing these votes evenly among the candidates and dividing them proportionately, based on what would have produced the best results on the historical data.”

• Paul A. Rubin

Dividing the undecided voters equally among candidates is hard to justify. Dividing them in proportion based on historical data confuses me a bit, as presumably different proportions best fit different historical races. Does this mean Silver has a formula (that takes into account multiple factors), or does it ultimately boil down to assigning the undecided voters either in the same proportion as the poll numbers or perhaps based on a monotonic function of poll proportions (so that front-runners get a somewhat higher-than-polling-value fraction of the undecided voters and fringe candidates do worse than proportional)?

In any case, I’m not sure the impact of non-response bias is the same for undecideds as for unenthused voters (“I’m going to vote for Romney because he’s the only electable one, even if he is a covert liberal pagan … but I can’t be bothered to fill out the poll”). And we can’t forget our friends in Florida who say they’re going to vote for X, really intend to vote for X, then end up voting for Y because the ballot was s-o-o-o confusing.

• Laura McLay

@prubin73: I agree. Unfortunately, the full methodology is not available on Silver’s website (at least I didn’t find it). I wonder if one could do an optimization over some of these “fudge factors” to find the best way to account for undecideds, momentum, etc. Presumably that he what he did to some extent, but I would like more of the gory details.