Tag Archives: march madness

My bracket tips on WISC-TV Live at Four

Watch me on WISC-TV (CBS) Live at Four on 3/15/2016

I had a wonderful time on Live at Four talking about bracketology. Kudos to Susan Siman and Mark Koehn as well as producer Steve Koehn for making everything so easy. Live news moves so fast! I didn’t articulate a few points very well. I’ll try to explain here:

  1. I didn’t say what I meant by an “upset.” Defining an upset is a maybe a super-professorial thing to do, but I can’t help myself. An upset is usually defined as the lower seeded team winning. However, some teams are seeded too low and others are too high as as a result some upsets are not as surprising as the seeds suggest. I like to take a look at the rankings of the two teams and the win probabilities to get a better sense of a game being hard to predict. For example, according to the FiveThirtyEight win probabilities, Arizona (6)/Wichita State (11) game features has a 50% win probability for each team despite a big seed differential. Other games may not be so evenly matched but produce win probabilities between 25%-75%, which means that we would expect at least 1 of 4 of these games to produce upsets, on average. I am not so surprised when this happens. I am more surprised when the win probabilities are less than 10%. There are definitely upsets in the tournament but often they are not rare events. For more, read Nate Silver’s thoughts on upsets here.
  2. Regarding the perfect bracket. While I think someone will get the perfect bracket, I don’t think it will happen this year. Or next year. Or in the next decade. Maybe we will see one within the next 100 years. There are different estimates on the probability of getting all the picks right, many are in the range of 1-in-100 billion. So many people fill out brackets–there were 11.5 million brackets on ESPN alone in 2015. With so many attempts to hit the target, I think one will strike…eventually. When it happens, it will happen on one of those years when there are not too many upsets. In 2014 with Final Four seeds 1,2, 7, 8 there were only 612 brackets on ESPN with all Final Four teams picked correctly. But in 2015 with Final Four seeds 1, 1, 1, 7 (the 7 seed was tournament darline MSU), there were 182,709 brackets with all Final Four teams picked correctly. Some years are easier to forecast than others. Given all the brackets in the online pools out there, I think we will eventually see a perfect bracket even if it does not win the official prize. More here.
  3. To win your pool, you don’t need a strategy that maximizes your points, you need a strategy that gives you more points than your opponents. Those two strategies are very similar but slightly different. To Here are Ken Massey’s composite rankings I mentioned in the interview for figuring out who the best teams are. And of course, here are my rankings at Badger Bracketology. Check our ESPN’s Who Picked Whom to see which teams may be overvalued and undervalued.

It’s time for me to stop over-analyzing my bracket advice and fill out my bracket. Good luck!

For more:


Two minute bracketology. My spin on math, bracketology & #MarchMadness

The University Communications group at the University of Wisconsin-Madison asked me to film a short video about bracketology, math (what is a Markov chain?), and what we can learn from bring math into bracketology. It was a wonderful experience, and I’m thrilled with the final product. I can’t take credit — the video was produced by Justin Bomberg and the video and reflects his vision.

The video is just under 3 minutes long. We filmed this before the selections were made, and as you can see, I recommend picking 10 seeds to upset 7 seeds. And of course, Wisconsin ended up seeded 7. Sorry team! Also, I naively agreed to attempt to attempt to spin a basketball on my finger while the film was rolling…I think you’ll easily understand why I’m more comfortable with the math.

I’m looking forward to watching the tournament. Go Badgers!


tips for filling out your tournament bracket and winning your March Madness pool

Here are a few things I do to fill out my bracket using analytics.

1. Let’s start with what not to do. Although a great record is meaningful, I usually don’t put a whole lot of weight on a team’s record because strength of schedule matters.

I do not like RPI either. RPI is a blend of a team’s winning percentage, its opponents (and their opponents) winning percentages (more here).  It just doesn’t lead to a useful tool for making bracket picks.

2. There are good math models that are helpful for picking a bracket. Use these sophisticated ranking tools. The seeding committee uses some of these ranking tools to select the seeds, so the seeds themselves reflect strength of schedule and implicitly rank teams.  Here are a few ranking tools that use math modeling.

You can use these rankings to pick the better team in your bracket. Oregon, for example, is not in the top 4 in any of these rankings.

3. Survival analysis quantifies how far each each team is likely to make it in the tournament. This doesn’t give you insight into team-to-team matchups per se, but you can think about the probability that Wisconsin or MSU or whoever making it to the Final Four reflecting an kind of average across the different teams a team might play during the tournament.

This is helpful for picking a top down bracket where you pick your Final Four first and then filling in your bracket from there.

4. Look at the seeds. Only once did all four 1-seeds make the Final Four. It’s a tough road. Seeds matter a lot in the rounds of 64 and 32, not so much after that point. There will be upsets. Some seed match ups produce more upsets than others. The 7-10 and 5-12 match ups are usually good to keep an eye on (unfortunately, the Badgers are a 7 seed this year so this means I might be predicting their demise. I hope I’m wrong!).

4. Don’t ignore preseason rankings. The preseason rankings are educated guesses on who the best teams are before any games have been played. It may seem silly to consider preseason rankings at the end of the season after all games have been played (when we have much better information!) but the preseason rankings seem to reflect some of the intangibles that predict success in the tournament (a team’s raw talent or athleticism).

6.Math models are very useful, but they have their limits. Math models implicitly assume that the past is good for predicting the future. This is not usually a good assumption when a team has had any major changes, like injuries or suspensions. You can check out crowdsourcing data (who picked who in a matchup), expert opinion, and things like injury reports to make the final decision.

On the other hand, experts sometimes focus too much on who is “hot” at the moment, thereby discounting the past too much. There is probably a “right” level of discounting, but people (experts included) have a short memory and may discount early data points from early in the season. So while I like experts to supplement my picks, I am also careful.

7. My final decision is to be strategic in picking your Final Four teams with respect to your opponents in your pool. It’s hard to win your bracket if everyone chooses, say, Wisconsin to win it all. Pick a unique team to win the tournament, be the runner up, or be in the Final Four to set your bracket apart. If that team makes it, then you will have a huge advantage in terms of winning your pool. But choose wisely.

This works in moderate sized pools, but no so much in huge pools. If you are in a big pool then your odds of winning with analytics are diluted by someone winning by pure luck (e.g., your friend who won the 2011 pool because they liked VCU’s mascot Rodney the Ram).

For more:

For more reading:

This blog post in an updated version of my post from last year.


tips for filling out a statistically sound bracket

Go Badgers!!

Here are a few things I do to fill out my bracket using analytics.

1. Let’s start with what not to do. I usually don’t put a whole lot of weight on a team’s record because strength of schedule matters. Likewise, I don’t put a whole lot of weight on bad ranking tools like RPI that do not do a good job of taking strength of schedule into account.

2. Instead of records, use sophisticated ranking tools. The seeding committee using some of these ranking tools to select the seeds, so the seeds themselves reflect strength of schedule and implicitly rank teams.  Here are a few ranking tools that use math modeling.

I like the LRMC (logistic regression Markov chain) method from some of my colleagues at Georgia Tech. Again: RPI bad, LRMC good.

3. Survival analysis quantifies how far each each team is likely to make it in the tournament. This doesn’t give you insight into team-to-team matchups per se, but you can think about the probability that Wisconsin making it to the Final Four reflecting an kind of average across the different teams a team might play during the tournament.

4. Look at the seeds. Only once did all four 1-seeds make the Final Four. It’s a tough road. Seeds matter a lot in the rounds of 64 and 32, not so much after that point. There will be upsets. Some seed match ups produce more upsets than others. The 7-10 and 5-12 match ups are usually good to keep an eye on.

4. Don’t ignore preseason rankings. The preseason rankings are educated guesses on who the best teams are before any games have been played. It may seem silly to consider preseason rankings at the end of the season after all games have been played (when we have much better information!) but the preseason rankings seem to reflect some of the intangibles that predict success in the tournament (a team’s raw talent or athleticism).

6.Math models are very useful, but they have their limits. Math models implicitly assume that the past is good for predicting the future. This is not usually a good assumption when a team has had any major changes, like injuries or suspensions. You can check out crowdsourcing data (who picked who in a matchup), expert opinion, and things like injury reports to make the final decision.

For more reading:


roundup of march madness sports analytics articles

  1. Michael Lopz (@StatsByLopez) uses analytics to identify which teams and over- and under-valued in the tournament.
  2. Evelyn Lamb at Scientific American blogs about the math behind a perfect bracket.
  3. Carl Bialik at FiveThirtyEight writes about the odds of getting a perfect bracket using analytical methods. It depends on how good those analytical methods are. Nate Silver claims it might be as high as 1 in 7.4 billion. Interesting.
  4. Will a 16 seed ever beat a 1 seed?” by Ed Feng (@thepowerrank) Ed also has a bracket tool that visualizes different game outcomes.
  5. My advisor Sheldon Jacobson who maintains BracketOdds was interviewed in the News-Gazette, the local paper in Champaign-Urbana, IL.
  6. The Huffington Post has a “Predict-O-Tron” that helps you fill out your bracket using a probabilistic tool that lets you set the importance of different attributes (like seed, offensive efficiency, and even tuition) using moving sliders. It looks interesting but reeks of overfitting.
  7. I was on the local NBC 15 affiliate in Madison on March 18 to discuss the odds of a perfect bracket (video included).

Good luck perfecting your bracket!


creating a March Madness bracket using integer programming

An Associated Press article on ESPN outlines how the Division I men’s basketball committee wants to make bracket construction to be more fair [Link]. At present, there are 68 teams with no plans to expand the field. However, the committee has many decisions to make when it comes to who makes it in and who doesn’t as well as the seed and the region. All of this together determines potential matches. Previously, the committee tried to entirely avoid rematches in the first few rounds of the tournament. Given the large number of potential match-ups depending on who wins and loses, this constrained the bracket (possibly too much).

“There have been years where we’ve had to drop a team or promote a team; there was even a year where teams dropped two seed lines. We don’t feel that’s appropriate.” – Ron Wellman, the athletic director at Wake Forest

The article doesn’t exactly hint that integer programming could be used to solve this problem, but that’s the next logical step. In fact, there is a paper on this! Cole Smith, Barbara Fraticelli, and Chase Rainwater developed a mixed integer programming model (published in 2006, back when there were 65 teams) to assign teams to seeds, regions, and pods (locations). The last issue is important: constructing the bracket is intertwined with assigning the bracket to locations for play. For example, four teams in a region in the field of 64 (e.g., a 1, 8, 9, and 16 seeds) must all play at the same location to produce a single team in the Sweet 16.

The Smith et al. model minimizes the sum of the (then) first-round travel costs (the round of 64), the (then) expected second-round travel costs  (the round of 32), and the reseeding penalty costs while considering typical assignment constraints as well as several side constraints, including:

  • no team plays on its home court (except in the Final Four – that location is selected before the tournament),
  • no intra-conference match-ups  occur before the regional finals (what was the fourth round). This is the constraint that may be relaxed somewhat in the new system. Therefore, this existing model can be used to make brackets in the proposed new system.
  • the top-seeded team from each conference must be assigned to a different region as the second- and third-highest seeded teams from that conference.
  • the best-seeded teams should be assigned to nearby pods (locations) in the first weekend (a reward for a good season!), and
  • certain universities with religious restrictions must be obeyed (e.g., Brigham Young University cannot play on Sundays).

It is worth pointing out that this model assigns the seeds to teams. A team that could be considered as an 11-13 seed would be assigned its seed (11, 12, or 13) based on the total cost of the system. That may seem like it’s unfair on some level, but it might be better for a team to be a 13 seed and play nearby than a 12 seed but have to travel an extra 1000 miles. (Note: Nate Silver and the 538 team use travel distance in their NCAA basketball tournament prediction model because distance matters). Flexible seeds allows for a bracket that gives more teams a fair shot at winning their games, but too much flexibility would be unfair to teams. The Smith et al. model allows for some flexibility for 6-11 seeds.

The mixed integer programming model by Cole Smith, Barbara Fraticelli, and Chase Rainwater already addresses the committee’s concerns, which begs the question: why isn’t the committee using integer programming??  

OK, it’s probably pretty easy to think of a few reasons, and none of them involve math. One concern is that the general public seems to distrust models of any kind. This may be because models are black boxes to non-experts. This lack of transparency makes it hard to generate any kind of public support (Exhibit A: the debate about the model for the BCS football rankings). Perhaps marketing could improve buy in (“The average team traveled 500 miles fewer this year than last” or “Five teams had to travel across all four US time zones last year, and none had to do so this year.”) A better suggestion may be to give a few of the top integer programming solutions to the committee, who can then use and adapt (or ignore) the solutions as they see fit. Currently, the committee looks at several rankings (including the LRMC method, last time I heard), so they are already using math models to influence the decisions ultimately made by humans.

How would you use operations research and math modeling to improve the tournament selection and seeding process?

Reference:

Smith, J.C., Fraticelli, B.M.P, and Rainwater, C., “A Bracket Assignment Problem for the NCAA Men’s Basketball Tournament,” International Transactions in Operational Research, 13 (3), 253-271, 2006. [Link to journal site]


will someone create a perfect bracket this year?

Warren Buffett is offering $1B to whomever produces a perfect bracket [Link]. Here is my take.

There are about 9.2 quintillion ways to fill out a perfect bracket. This is often mistakenly used to predict the odds of filling out a perfect bracket – it is not 9-quintillion-to-1 because:

(a) the tournament isn’t like the lottery where every outcome is equally likely, and

(b) monkeys are not randomly selecting game outcomes. Instead, people are purposefully selecting outcomes.

Outcomes for “good” brackets made by people who play the odds and, for example, choose 1 seeds to beat 16 seeds in the second round. These brackets have a much better chance of reaching perfection, somewhere in the range of 128 billion-to-1 or 150 billion-to-1 (See here and here).

The limitation here is that these odds give an individual likelihood of getting a perfect bracket; they give no insight into how to construct a pool of brackets that collectively has a high degree of likelihood for producing a perfect bracket.

Just like in the lottery, there is a difference between you willing the lottery and someone winning the lottery (just like in the classic Birthday Problem). Let’s say we have the perfect methodology that gives us the 150 million-to-1 odds. If 150M people filled out brackets, would we expect to see a perfect bracket? Probably not. If everyone used the same methodology that maximized our individual chance of getting a perfect bracket, this wouldn’t necessarily lead to a pool of brackets that collectively guarantee that someone gets a perfect bracket. The problem is, many of the brackets will be identical or almost-identical if they use the same methodology (meaning that they are all perfect or they are all not perfect). There needs to be enough variation between the entries to probabilistically “cover” the possible brackets with a certain reliability level. We would expect to see more variation between entries in the lottery, where many people purchase lottery tickets with randomly generated numbers (and we can more easily estimate the odds that someone will win a lottery based on the number of tickets sold). Recall: randomly generated brackets aren’t the answer! In a nutshell: what is good for the goose isn’t necessarily good for the gander.

The probability of a perfect bracket depends on the tournament. Let’s look at brackets in the last 3 years on ESPN. Let’s only look at how many people correctly select all Final Four teams:
– 182,709 of 11.57 million brackets correctly picked all Final Four teams in 2015
– 612 of 11 million brackets correctly picked all Final Four teams in 2014
– 47 of 8.15 million brackets correctly picked all Final Four teams in 2013
– 23,304 of 6.45 million brackets correctly picked all Final Four teams in 2012
– 2 of 5.9 million brackets correctly picked all Final Four teams in 2011

Both 2011 and 2013 had “Cinderella stories” of VCU and Wichita State, respectively. A single surprise can drastically affect the number of outcomes and make it less likely for someone to have a perfect bracket. On the other hand, when a 1 seed wins the tournament, brackets have more correct picks, on average. Certain tournaments therefore provide the right atmosphere that could lead to perfect brackets than others.

While having a good methodology for filling out a bracket is key to maximizing your chances, chance plays a much larger role. However, while you cannot control the randomness of the tournament, you can control how you fill out a bracket. In terms of strategy, a person should use statistics, analytical methods, and expert opinions to fill out a bracket to maximize the chance of picking a perfect bracket.

It would be a mistake to look at the two best brackets in 2011 and use the methodology that went into creating those brackets in other tournaments. Basing your bracket methodology on a single tournament is not a good idea (a single tournament is a small sample, no statistically significant conclusions can be drawn from it). If we applied the 2011 methodology to other years, we would quickly see that in the long run, we would do very poorly in March Madness office pools.

If we are acting in our own self-interests (and we are if we want that $1 billion prize!) then we should use the best models to maximize our personal odds and then hope for the best. Luckily, my colleagues have used analytics, operations research, and math to create some pretty good methods we can use to fill out brackets. This is a terrific place to start.

For my tips on filling out a bracket based on analytical methods: read my post here.

Are you participating in the Warren Buffett contest?

[updated on 3/16/2016]