June 4, 2014

Markov chains for ranking sports teams

My favorite talk at ISERC 2014 (the IIE conference) was “A new approach to ranking using dual-level decisions” by Baback Vaziri, Yuehwern Yih, Mark Lehto, and Tom Morin (Purdue University) [Link]. They used a Markov chain to rank Big Ten football teams in their ability to recruit prospective players. Players would accept one of several offers. The team that got the player was the “winner” and the other teams were losers. We end up with a matrix P where element (i,j) in P is the number of times team j beats team i.

The Markov chain is then normalized so that each row sums to 1 and solved for the limiting distribution. The probability of being in team j in the limit was interpreted as meaning the proportion of time that team j is the best. Therefore, the limiting distribution can be used to rank teams from best to worst.

They found that using this method with 2001 – 2012 data, Wisconsin was ranked fourth, which was much higher than it was ranked by experts and explains why they have been to 12 bowl games in a row. Illinois (my alma mater) was ranked second to last, only above lowly Indiana.

https://twitter.com/lauramclay/status/473557307197247488

I used this method regular season 2014 Big Ten basketball wins and ended up with the following ranking. I also have the official ranking based on win-loss record for comparison. We see large discrepancies for only two teams: Michigan State (which is over-ranked according to its win-loss record) and Indiana (which is under-ranked according to its win-loss record). The Markov chain method ranks these two teams differently because Indiana had high quality wins despite not winning so frequently and because Michigan State lost to a few bad teams when they were down a few players due to injuries.

Ranking	MC Ranking	W-L record Ranking
1	Michigan	Michigan
2	Wisconsin	Wisconsin
3	Indiana	Michigan State
4	Iowa	Nebraska
5	Nebraska	Ohio State
6	Ohio St	Iowa
7	Michigan St	Minnesota
8	Minnesota	Illinois
9	Illinois	Indiana
10	Penn St	Penn State
11	Northwestern	Northwestern
12	Purdue	Purdue

Sophisticated methods are a little more complex than this. Paul Kvam and Joel Sokol estimate conditional probabilities in the transition probability matrix for the logistic regression Markov chain (LRMC) model using logistic regression [Paper link here]. The logistic regression yields an estimate for the probability that a team with a margin of victory of x points at home is better than its opponent, and thus, looks at margin of victory not just wins and losses.

https://twitter.com/lauramclay/status/473559305476915202

https://twitter.com/lauramclay/status/473234511732695040

This entry was posted on Wednesday, June 4th, 2014 at 9:54 am and tagged with sports and posted in Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed.

4 responses to “Markov chains for ranking sports teams”

Vince
June 4th, 2014 at 11:37 am

What a cool post! Really nice approach.is recruitment data readily available?
Laura McLay
June 4th, 2014 at 12:00 pm

Not yet. Hopefully I will get a copy of the paper and/or data soon. They said the recruitment data was obtained from rivals.com.
Paul Rubin
June 7th, 2014 at 3:14 pm

Estimating recruiting success probably involves a substantially larger sample size than estimating won/loss ratios, especially when some pairs of teams may play each other only once in a season. (On the other hand, recruiting is not a two-player game — schools A and B may lose the same recruit to school C.) The W-L column pretty nicely mirrors participation in the NCAA tournament, including how long the teams lasted at The Dance. The M-C rankings are not completely out of line, but they’re less accurate.
Hitting Probabilities for Markov Chains | Eventually Almost Everywhere
June 17th, 2014 at 4:05 pm

[…] Markov chains for ranking sports teams […]