This blog post is inspired by my disappointing NCAA March Madness bracket. I used math modeling to fill my bracket, and I am currently in the 51st percentile on ESPN. On the upside, all of my Final Four picks are still active so I have a chance to win my pool. I am worried that my bracket has caused me to lose all credibility with those who are skeptical of the value of math modeling. After all, guessing can lead to a better bracket. *Isn’t Nate Silver a wizard? How come his bracket isn’t crushing the competition? *Here, I will make the case that a so-so bracket is not evidence that the math models are bad. To do so, I will discuss why it is so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament.

Many models for the Presidential election and the basketball tournament are similar in that they use various inputs to predict the probability of an outcome. I have discussed several models for forecasting the Presidential election [Link] and the basketball tournament [Link].

All models that didn’t solely rely on economic indicators chose Obama to be the favorite, and nearly all predicted 48+ of the states correctly. In other words, *even a somewhat simplistic model to forecast the Presidential election could predict the correct outcome 96% of the time*. I’m not saying that the forecasting models out there were simplistic – but simply going with poll averages gave good estimates of the election outcomes.

The basketball tournament is another matter. Nate Silver has blogged about how models to predict tournament games using similar math models. Here, we can only predict the correct winner 71-73% of the time [Link]:

Since 2003, the team ranked higher in the A.P.

preseasonpoll (excluding cases where neither team received at least 5 votes) has won 72 percent of tournament games. That’s exactly the same number, 72 percent, as the fraction of games won by the better seed. And it’s a little better than the 71 percent won by teams with the superior Ratings Percentage Index, the statistical formula that the seeding committee prefers. (More sophisticated statistical ratings, like Ken Pomeroy’s, do only a little better, with a 73 percent success rate.)

To do well in your bracket, you would need to make small marginal improvements over using the naive model of always picking the better seed (72% success rate). Here, a 96% success rate would be unrealistic — an improved model that would get 75% of the games correctly would give you a big advantage. The big advantage here means that if you used your improved method in 1000 tournaments, it would do better on average than a naive method. In any particular tournament, the improved method may still lead to a poor bracket. It’s a small sample.

The idea here is similar to batting averages in baseball. It is not really possible to notice the difference between a 0.250 batter and a 0.300 batter in a single game or even across the games in a single week. The 0.250 hitter may even have a better batting average in any given week of games. Over the course of the season of 162 games, the differences are quite noticeable when looking at the batters’ batting average. The NCAA does not have the advantage of averaging performance over a large number of games — we are asked to predict a small set of outcomes in a single tournament where things will not have a chance to average out (it’s The Law of Small Numbers).

It’s worth noting that actual brackets get fewer than 72% of the games correct because errors are cumulative. If you put Gonzaga in the Elite Eight and they are defeated in the (now) third round and do not make it to the Sweet Sixteen, then *one* wrong game prediction leads to *two* wrong games in the bracket.

It’s also worth noting that some games are easier to predict than others. In the (now) second round (what most of us think of as the first round), no 1 seed has ever lost to a 16 seed, and 2 seeds have only rarely lost to 15 seeds (it’s happened 7 times). Likewise, some states are easy to predict in Presidential elections (e.g., California and Oklahoma). The difference is that there are few easy to predict games in the tournament whereas there are many easy to predict states in a Presidential election. Politico lists 9 swing states for the 2012 election. That is, one could predict the outcome in 82% of the states with a high degree of confidence by using common sense. In contrast, one can confidently predict ~12% of tournament games in the round of 64 teams using common sense (based on four of the games corresponding to 1 seeds). Therefore, I would argue that there is more parity in college basketball than there is in politics.

How is your bracket doing?

March 26th, 2013 at 11:32 am

To my eyes, the biggest difference is that the presidential forecasts were being updated constantly as new polls or other information came in.

The biggest challenge in forecasting the NCAA college basketball tournaments is that it’s a one shot deal. You can’t adjust as the tournament progresses.

Yahoo! does run a second chance bracket once they’re down to 16 teams / 15 games, but it’s not the headline event.

Back to Presidential election forecasts, I thought this piece was great at applying “proper scoring rules” as a metric of predictive accuracy. One thing that really bugged me about Nate Silver’s forecasts is that he gave non-negligible probabilities to things like Obama winning Mississippi or Romney winning California. But again, that analysis looks at the last prediction before the election and not at how their predictions changed over time.

http://appliedrationality.org/2012/11/09/was-nate-silver-the-most-accurate-2012-election-pundit/

March 26th, 2013 at 11:36 am

Sorry, not Mississippi. Georgia.

March 26th, 2013 at 11:36 am

Thanks for the comments and the link. These are great points.

Silver adds poll bias into his models, which allows for higher probabilities for unexpected election outcomes than do some of other models used. So, he ends up with a slightly higher negligible probability that Mississippi will go Democrat.

March 26th, 2013 at 11:44 am

Right, I get it, but I think it’s weasley to hedge like that.

March 26th, 2013 at 11:46 am

63rd% (currently) on Yahoo. This year, if someone’s bracket *didn’t* get busted, they probably cheated. 😀

March 26th, 2013 at 11:49 am

If you knew in advance how everyone was going to vote, you would know the election outcome exactly. And at this point we can fairly accurately estimate how people are going to vote (in the aggregate), which allows us to fairly accurate predict the election results.

There is no corresponding “if you knew in advance” for predicting the winner of a basketball game, other than the tautological “if you knew in advance who was going to win the basketball game”. Said differently, if NCAA tournament games were decided by large-scale *voting*, then we would probably be pretty good at predicting them.

March 27th, 2013 at 12:33 pm

Used the LRMC picks, and currently in the 82nd percentile on yahoo. Still likely to lose my pool though unless things go quite well.

March 28th, 2013 at 8:46 pm

[…] McLay asks why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball … Nate Silver famously predicted the winner of all 50 states; but if you look at the NCAA basketball […]

March 28th, 2013 at 10:00 pm

[…] “Why is it so easy to forecast the Presidential election and so hard to forecast the NCAA basketball tournament?” https://punkrockor.wordpress.com/ … […]

April 2nd, 2013 at 8:05 am

94% on yahoo, but still not going to win. combo of 2 models – 1 that accounts for defense and one for 3 point effectiveness. 3 out of 4 variations had FGCU going as far as they did, but my husband talked me out of taking that path. hey, maybe next year!