March Madness is upon us! It’s one giant orgy of college hoops, and the fact that it usually occurs during Lent makes it even more sinfully enjoyable. Yeah, it’s my favorite sports thing ever. Best of all, the tournament is rich with OR problems.
Sokol et al. Model
Joel Sokol and Paul Kvam (and later George Nemhauser) developed Linear Regression/Markov Chain (LRMC) model to determine team rankings, which can be used to fill out tournament brackets. The current ranking is on their site. Their approach has been very successful in the past six years. They write on their site,
- LRMC is right more often: When LRMC and other NCAA tournament ranking methods disagree, the team LRMC ranks higher wins significantly more often than the other method’s team.
- LRMC is particularly effective at sorting out the top teams, as measured by the last three rounds of the NCAA tournament, and it is more successful at identifying “surprise” Final Four teams. (We’re not perfect, though–LRMC didn’t expect George Mason’s run in 2006!)
- LRMC is more effective at picking potential bubble teams; the teams it ranks in the “last-teams-in” bubble range tend to win more games than the teams that other methods rank in that area.
Mike Trick also wrote about LRMC today.
Kvam, P. and J.S. Sokol, “A logistic regression/Markov chain model for NCAA basketball“, Naval Research Logistics 53, pp. 788-803.
Kaplan and Garstka Model
In 2001, Edward Kaplan and Stanley Garstka provided models to make good bracket picks in your office pool using Markov probability models. They use all sorts of data (including RPI, Las Vegas odds, and NIT results) to help make good picks.
The most common office pool strategy is to simply pick the higher seeded team to win each match-up. Such a strategy, Kaplan says, would have produced an overall success rate of 56 percent over the last three years of NCAA and NIT tournaments, a significant improvement over pure chance. Of the 63 games played in the NCAAs, for example, one would expect to get about 36 of them correct just picking the higher seeded team in each round before the tournament began… Picking the highest seeds, however, doesn’t give you an advantage over other participants, since the seedings are known to all pool players in advance. The NCAA selection committee has told you and everyone else who the favorites are in every game.
Kaplan, Edward H. and Garstka, Stanley J., 2001, “March Madness and the Office Pool,” Management Science 47(3), 369-382.
Smith et al. Model
Cole Smith, Barbara Fraticelli, and Chase Rainwater developed a mixed integer programming model to determine which teams should play at which locations. The first two rounds are played the first weekend, which whittle the field of 65 down to 16. The next two rounds that determine which teams make the Final Four occur the following weekend and are played a second set of locations. Four seeds are assigned to a single location (a “pod“) each of these weekends. Their paper provides a
more efficient way of assigning teams to opening round locations and regions, in order to reduce the amount of travel time for both teams and fans. As the athletes involved in this tournament are college students, it is clearly a desirable goal to play games in nearby locations in order to minimize the interference with academic work.
Assigning the top seeds to nearby locations is a top priority, but the (predetermined) locations have been used to determine the seeds themselves (although not the selection of teams for the tournament) so this is an important problem.
Smith, J.C., Fraticelli, B.M.P, and Rainwater, C., “A Bracket Assignment Problem for the NCAA Men’s Basketball Tournament,” International Transactions in Operational Research, 13 (3), 253-271, 2006.
Coleman and Lynch Model
Jay Coleman of the University of North Florida and Allen Lynch of Mercer University can determine which teams will make the tournament with with 93.7-percent accuracy (the “at-large” berths). Their model is called the “Dance Card” and is explained in a video. They missed four teams this year (of a possible 34 at large berths).
Using the decisions of the NCAA Tournament Selection Committee from the years 1994 through 1999, and 42 pieces of information (e.g. RPI rankings, number of wins and losses, conference records, etc.) for all teams that were candidates for at-large selections in those years, Coleman and Lynch devised the Dance Card formula as an estimate of which pieces of information were most important to the Selection Committee, and the weights that the Committee placed on those pieces of information. The Dance Card formula suggests that only six pieces of information about each team are highly important in determining whether it gets an at-large Tournament bid:
- RPI (Ratings Percentage Index) Rank
- Conference RPI Rank
- Number of wins against teams ranked from 1-25 in RPI
- Difference in number of wins and losses in the conference
- Difference in number of wins and losses against teams ranked 26-50 in RPI
- Difference in number of wins and losses against teams ranked 51-100 in RPI
They also have a model (called the “Score Card”) for determining which teams will win in the tournament.
Using the results of the tournament games in the years 2000 through 2006, and numerous pieces of information based on (or derived from) the so-called “nitty-gritty report” for each of the two teams that played in each game (e.g., RPI, conference records, out-of-conference records, etc.), Coleman and Lynch devised the Score Card formula as an estimate of which pieces of information were related to who wins games in the tournament, and the weights that are placed on those pieces of information. The Score Card formula suggests that only four pieces of information about each team are related to performance in the tournament:
- RPI (Ratings Percentage Index) value (i.e., the “old” RPI value)
- Ranking of the conference the team comes from (using the non-conference RPI ranking of each conference)
- Whether the team won its regular season conference championship
- Number of wins in the last 10 games