Markov chains for ranking sports teams

My favorite talk at ISERC 2014 (the IIE conference) was “A new approach to ranking using dual-level decisions” by Baback Vaziri, Yuehwern Yih, Mark Lehto, and Tom Morin (Purdue University) [Link]. They used a Markov chain to rank Big Ten football teams in their ability to recruit prospective players. Players would accept one of several offers. The team that got the player was the “winner” and the other teams were losers.  We end up with a matrix P where element (i,j) in P is the number of times team j beats team i.

The Markov chain is then normalized so that each row sums to 1 and solved for the limiting distribution. The probability of being in team j in the limit was interpreted as meaning the proportion of time that team j is the best. Therefore, the limiting distribution can be used to rank teams from best to worst.

They found that using this method with 2001 – 2012 data, Wisconsin was ranked fourth, which was much higher than it was ranked by experts and explains why they have been to 12 bowl games in a row. Illinois (my alma mater) was ranked second to last, only above lowly Indiana.

https://twitter.com/lauramclay/status/473557307197247488

I used this method regular season 2014 Big Ten basketball wins and ended up with the following ranking. I also have the official ranking based on win-loss record for comparison.  We see large discrepancies for only two teams: Michigan State (which is over-ranked according to its win-loss record) and Indiana (which is under-ranked according to its win-loss record). The Markov chain method ranks these two teams differently because Indiana had high quality wins despite not winning so frequently and because Michigan State lost to a few bad teams when they were down a few players due to injuries.

 

Ranking MC Ranking W-L record  Ranking
1 Michigan Michigan
2 Wisconsin Wisconsin
3 Indiana Michigan State
4 Iowa Nebraska
5 Nebraska Ohio State
6 Ohio St Iowa
7 Michigan St Minnesota
8 Minnesota Illinois
9 Illinois Indiana
10 Penn St Penn State
11 Northwestern Northwestern
12 Purdue Purdue

Sophisticated methods are a little more complex than this. Paul Kvam and Joel Sokol estimate conditional probabilities in the transition probability matrix for the logistic regression Markov chain (LRMC) model using logistic regression [Paper link here]. The logistic regression yields an estimate for the probability that a team with a margin of victory of x points at home is better than its opponent, and thus, looks at margin of victory not just wins and losses.

https://twitter.com/lauramclay/status/473559305476915202

https://twitter.com/lauramclay/status/473234511732695040

 


4 responses to “Markov chains for ranking sports teams

  • Vince

    What a cool post! Really nice approach.is recruitment data readily available?

  • Laura McLay

    Not yet. Hopefully I will get a copy of the paper and/or data soon. They said the recruitment data was obtained from rivals.com.

  • Paul Rubin

    Estimating recruiting success probably involves a substantially larger sample size than estimating won/loss ratios, especially when some pairs of teams may play each other only once in a season. (On the other hand, recruiting is not a two-player game — schools A and B may lose the same recruit to school C.) The W-L column pretty nicely mirrors participation in the NCAA tournament, including how long the teams lasted at The Dance. The M-C rankings are not completely out of line, but they’re less accurate.

Leave a comment