My last post was about how to choose a winning bracket in the NCAA men’s basketball tournament. I linked to several tools for predicting which team is likely to win the outcome of a game. These tools
- provide a rank ordering of the teams from best to worst,
- compute the odds of which team would win in a matchup based on their tournament seed, or
- provide odds of a team making it to different levels of the tournament based on specific matchups.
I linked to the methodologies used by these tools in my last post but didn’t get into the details. Here, I am going to discuss the methodologies in more detail. I am going to focus on tools that predict the outcome of specific tournaments (#3 above).
Wayne Winston noted in Mathletics that there is no transitivity in matchups. That is, if team A is favored to beat team B and team B is favored to beat team C, this does not imply that team A is favored to beat team C. Thus, the team rankings (#1 above) are not a perfect tool for predicting specific matchups. He uses “power ratings” to compute how many points one team is better than the other (a point spread), which takes home field advantage and other factors into account. He then converts the point spread to the probability of winning using historical game outcomes (basically, a normal distribution with a history-derived standard deviation) or simulates the games to compute the odds of winning.
Nate Silver’s model is interesting in that it takes many inputs, including the ranking tool outcomes from #1 above. His model uses blends four ranking models to take a more pluralistic view of who might win. I think this is a strength because it uses the wisdom of crowds (a small crowd in this case). Each of the four tools contributes 1/6 of the total power rating (a margin of victory). Seed number and whether the team was ranked in preseason polls each contribute 1/6 of the power rating. He then makes adjustments for the geography of the game and player injuries and absences. He doesn’t describe his forecast probabilities in detail, but I suspect that his approach is similar to Wayne Winston’s. A team’s power rating is adjusted in each round based on the outcomes from previous rounds to account for potential errors in the power rating, another strength of the model.
Finally, Luke Winn and John Ezekowitz’s model doesn’t use power ratings [methodology here] – it instead applied survival analysis to predict when a team may drop out of the tournament. This model computes hazard rates for each team based on the team’s RPI and Ken Pomeroy’s ranking. They also consider
- tournament experience,
- out-degree network centrality that captures the number of games played and won against other NCAA tournament teams (see picture below), and
- the negative interaction of the Experience and Out-Degree Centrality variables
Cox Proportional Hazard regression was used to rerank the teams.
Other recommended reading on March Madness and analytical methods:
- Evelyn Lamb at Scientific American blogs about the odds of middle seeds winning
- Sheldon Jacobon’s tips for picking games based on seeds are featured in Business Week