# Tag Archives: march madness

## the mathematics of rare matchups in the March Madness tournament

This year’s Final Four is set in the NCAA men’s basketball tournament, with Duke, the University of North Carolina (UNC), Kansas, and Villanova facing off this weekend. This is the first time Duke and UNC will play in the tournament. At first blush this is hard to believe when considering how often these two teams have played in the tournament (a combined total of 334 games!). It’s easier to believe when considering the mathematics used to create the bracket.

I once blogged about the constraints required to seed the 68 teams in the tournament and build the bracket. The NCAA’s website indicates the same rules are still in use.

First, the 68 teams are selected, sorted, and seeded. This is a long process. Then, the 68 teams are assigned to one of the four regions to create the bracket. There are many rules for this last step. Here is the rule that explains why Duke and UNC haven’t played in the tournament before:

“Each of the first four teams selected from a conference shall be placed in different regions if they are seeded on the first four lines.”

Duke and UNC are almost always in the first four teams of their conference, the Atlantic Coast Conference. They typically play each other twice during the regular season and sometimes a third time in the ACC conference tournament. Duke and UNC played each other twice this season. According to the NCAA constraints for constructing a bracket, Duke and UNC are not allowed to meet in the tournament before the Final Four. This is when they are meeting in the 2022 tournament. Mathematical constraints secretly guide the tournament.

Fun fact: it is not always possible to create a feasible bracket that conforms to all of the rules.

There are several other constraints for constructing a bracket. Infeasibility can happen in real applications of mathematical optimization. Mathematical constraints do not make nuanced exceptions to the rules the way human decision makers do, so infeasible problem instances must be addressed with humans.

The selection committee addresses the problem of infeasibility by moving a team’s seed up or down by one and sometimes two. This seems like a small change, but it can drastically change a team’s path to the Final Four. The good news is that about a decade ago the rules were tweaked to change teams’ seeds less often, in a victory for the tournament and also for mathematics.

## constructing a NCAA basketball tournament bracket is NP-hard

I continue to receive a lot of questions about Wisconsin’s 8 seed (read the blog post here). The NCAA men’s basketball selection committee doesn’t just assign seeds to teams, it also puts the teams into the bracket, which involves assigning a team a seed, assigning other teams to the regions, and assigning locations to each game. This is an NP-hard problem that involves balancing several interdependent decisions.  You can formulate the problem as an integer programming model subject to many assignment and distance constraints. Cole Smith, Barbara Fraticelli and Chase Rainwater published a paper on optimization models to design good tournaments that balances these constraints while getting the seeds right. As far as I know, the committee constructs the bracket by hand.

Below are the official rules for the final part of constructing the tournament called “Building the Bracket.” The emphasis is mine–I boldfaced the major rules that constrain how the bracket is built. There are a lot of rules, so many that even identifying a feasible solution may be difficult. Near the end: “A team may be moved up or down one (or in extraordinary circumstances) two lines from its true seed line when it is placed in the bracket if necessary to meet the principles.” I suspect Wisconsin was given an 8 seed to conform to all the rules. It’s worth pointing out that Wisconsin’s low seed was ultimately unfair to top seed Villanova, who had an extremely second round opponent (sorry not sorry, ‘Nova). Optimization models and algorithms could have constructed a bracket that was more fair to the teams.

III. BUILDING THE BRACKET

Sixteen levels are established (i.e., the seeds, 1 through 16) in the bracket that cross the four regions, permitting evaluation of four teams simultaneously on the same level.  Teams on each seed line (No. 1, No. 2, No. 3, etc.) should be as equal as possible.

Each region is divided into quadrants with four levels in each, permitting the evaluation of four different sections within each region against the same sections in each of the other regions.

The committee will assign all four teams in each bracket group (seeds 1, 16, 8, 9), (4, 13, 5, 12), (2, 15, 7, 10), (3, 14, 6, 11) to the same first-/second-round site. There will be two ”pods‟ at each first-/second-round site which may feed into different regional sites.

Each of the first four teams selected from a conference shall be placed in different regions if they are seeded on the first four lines.

Teams from the same conference shall not meet prior to the regional final if they played each other three or more times during the regular season and conference tournament.

Teams from the same conference shall not meet prior to the regional semifinals if they played each other twice during the regular season and conference tournament.

Teams from the same conference may play each other as early as the second round if they played no more than once during the regular season and conference tournament.

Any principle can be relaxed if two or more teams from the same conference are among the last four at-large seeded teams participating in the First Four.

To recognize the demonstrated quality of such teams, the committee shall not place teams seeded on the first four lines at a potential “home-crowd disadvantage” in the first round.

The last four at-large teams on the overall seed list, as well as teams seeded 65 through 68, will be paired to compete in the First Four games on Tuesday and Wednesday following the announcement of the field. (If allowed, the last at-large team on the seed list  will be paired with the second-to-last at-large team on the seed list. The other First Four games will consist of the third-to-last at-large team on the seed list playing the fourth-to-last at-large team on the seed list, as well as seed 65 versus 66; and seed 67 versus 68).

The winners of the First Four games will advance to a first- and second-round site to be determined by the committee during selection weekend. In the event a First Four site is also a first- and second-round site, the winners of the First Four games may be assigned to that site, regardless of the days of competition.

Teams will remain in or as close to their areas of natural interest as possible. A team moved out of its natural area will be placed in the next closest region to the extent possible. If two teams from the same natural region are in contention for the same bracket position, the team ranked higher in the seed list shall remain in its natural region.

A team will not be permitted to play in any facility in which it has played more than three games during its season, not including exhibitions and conference postseason tournaments.

A host institution’s team shall not be permitted to play at the site where the institution is hosting. However, the team may play on the same days when the institution is hosting.

Teams may play at a site where the conference of which it is a member is serving as the host.

A team may be moved up or down one (or in extraordinary circumstances) two lines from its true seed line (e.g., from the 13 seed line to the 12 seed line; or from a 12 seed line to a 13 seed line) when it is placed in the bracket if necessary to meet the principles.

Procedures for Placing the Teams into the Bracket
(a procedure for ensuring that the regions are roughly balanced)

1. The committee will place the four No. 1 seeds in each of the four regions, thus determining the Final Four semifinals pairings (overall 1 vs. 4; 2 vs. 3).

2. The committee will then place the No. 2 seeds in each region in true seed list order. The committee may relax the principle of keeping teams as close to their area of natural interest for seeding teams on the No. 2 line to avoid, for example, the overall No. 5 seed being sent to the same region as the overall No. 1 seed. The committee will not compromise the principle of keeping teams from the same conference in separate regions.

3. The committee will then place the No. 3 seeds in each region in true seed list order.

4. The committee will then place the No. 4 seeds in each region in true seed list order.

5. After the top four seed lines have been assigned, the committee will review the relative strengths of the regions by adding the “true seed” numbers in each region to determine  if  any  severe  numerical imbalance exists. Generally, no more than five points should separate the lowest and highest total.

6. In “true seed” order, the committee then assigns  each  team  (and,  therefore,  all teams in its bracket group—e.g., seeds 1, 8, 9, 16) to first-/second-round sites.

7. The committee will then place seeds Nos. 5-16 in the bracket, per the principles. The four  teams  assigned  to  the  seed  line,  5 through 16, will have the same numerical
value.

1. If possible, rematches of non-conference regular-season games should be avoided in the First Four and first round.

2. If possible, after examining the previous two years’ brackets, teams or conferences will not be moved out of its natural region or geographic area an inordinate number of times.

3. If  possible, rematches from the previous two tournaments should be avoided in the first round.

You can read the official rules for selection and seeding for the men’s tournament is here. I did not find nearly so many rules or information about selecting and seeding the women’s tournament.

## The Math Behind March Madness

On March 14, 2017 I gave a talk about bracketology, March Madness, and the College Football Playoff in the Discovery Building on the University of Wisconsin-Madison campus. The talk was recorded and can be viewed here or here:

My slides from the talk are here:

## a few last minute thoughts on filling out a perfect bracket

Yesterday, I was on Madison’s CBS affiliate WISC-TV to talk about March Madness and filling out a bracket.

I was also on NBC15 in Madison to talk about the probability of filling out a perfect bracket.

Here is one data point from 2015:

Out of more than 11.57 million brackets entered in ESPN’s Tournament Challenge, one bracket emerged from the round of 64 of the NCAA tournament with a perfect 32-0 record. This is the first time there was a perfect first round in ESPN’s Tournament Challenge since at least 2010 (we’re still trying to find out exactly the last time it happened, but it’s been a number of years).

A little research uncovers 1 perfect bracket after the first 32 games in 54.85 million brackets at ESPN.

Last year, 25,704 of 13 million brackets (0.2%) remained perfect after the first 16 games and none were perfect after the first 32 games. None of those correctly picked the next 16 games.

The probability of a perfect bracket depends on the upsets in any given year. We can see that by observing the number of brackets that correctly selected all Final Four teams:

• 1140 of 13 million brackets correctly picked all Final Four teams in 2016
• 182,709 of 11.57 million brackets correctly picked all Final Four teams in 2015. This year stands out because the Final four was composed of #1 Wisconsin, #1 Kentucky, #1 Duke and #7 Michigan State.
• 612 of 11 million brackets correctly picked all Final Four teams in 2014
• 47 of 8.15 million brackets correctly picked all Final Four teams in 2013
• 23,304 of 6.45 million brackets correctly picked all Final Four teams in 2012
• 2 of 5.9 million brackets correctly picked all Final Four teams in 2011

Of course, these brackets missed several games along the way so none are perfect, but from these data points we can see that it’s inherently more difficult to pick a correct bracket when there are more upsets in any give year or unlikely teams in the Final Four.

## Yes, Wisconsin should have been seeded higher

This year I have fielded a lot of questions about Wisconsin’s seed. As the second place finisher in the B1G and runner up in the B1G conference championship, it’s hard to accept an 8 seed. It’s even harder given that lower ranked teams are seeded higher. Maryland finished 3rd in the B1G and received a 6 seed, Minnesota was 4th in the B1G and received a 5 seed, Northwestern finished 6th in the B1G and received an 8 seed, and Michigan received a 7 seed after winning the conference tournament. Wisconsin and Northwestern really don’t deserve the same seeds.

I can’t mathematically prove that Wisconsin was robbed, but I’ll try to step through the process to shed light on what happened.

First, I’ll say a few things about how seeding works.

The first step is selecting the field, and this is started well in advance of Selection Sunday. There are a number of teams who automatically make the tournament, and the committee first chooses who received at large bids. It’s safe to assume that Wisconsin was a lock to make the tournament all along.

The selection committee looks at several ratings/rankings including LRMC (Wisconsin is 22), the Pomeroy ratings (Wisconsin is 23), the Sagarin ratings (Wisconsin is 17), ESPN’s BPI (Wisconsin is 21). Wisconsin is in the top 23 in all of these aside from RPI (where Wisconsin is 36), but RPI isn’t a very good tool and is not used to seed the teams. The committee then ranks the teams 1st to 68th (the “S curve”) to get a larger sense of what the seeds should be and to identify if the seeds in each region are balanced when the seeding is done.

Teams that are consistently ranked in the top 25 are often seeded 6th or so, which is significantly higher than the 8 seed Wisconsin was assigned.

There is not a lot of competition for the 1 seeds because so few teams can make a claim for the one seed. The ends of the distribution are easy but the middle is tougher because there is less difference between the 20th ranked team and the 40th ranked team than between the top ranked team and the 10th ranked team. Therefore, a team in the middle could reasonably be assigned to either a 5 seed or an 8 seed or anywhere in between.

Assigning seeds is not as simple as knowing that there are four of each seed and picking one for each team. There is a lot more to it than that. Scheduling the tournament is hard because there are a lot of constraints. Teams from the same conference cannot meet in the round of 64 or 32. Therefore, a team’s seed might need to be slightly malleable to make it all work while obeying these constraints. Additionally, as mentioned earlier, the seeds need to be assigned so that the strength of each of the regions is roughly balanced. For example, the same region should not contain the best best 1 seed, the best 2 seed, and the best 3 seed.

There is more. Distance is taken into account when assigning the seeds. The committee assigns game locations at the same time it assigns seeds. That makes for a lot of interconnected decisions. All seeds are assigned to one of 8 locations called “pods” where the tournament games are played. For example, the 1, 8, 9, and 16 seeds must play in the same location. The goal is to minimize travel for most of the teams, especially the highest seeded teams who often basically get “home” games where their fans do not have long to travel. There are eight pods, and two of the 4-team groupings are assigned to each pod.

It’s possible to assign a team a lower seed so that they would have substantially less travel. That was not the case for Wisconsin, who was assigned an 8 seed in Buffalo.Initially, I suspected that Wisconsin was assigned an 8 seed in the Milwaukee pod, which would have been essentially a home game. That was not the case. Instead, the Badgers are getting the worst of both worlds: a bad seed and a long distance to travel. I’m puzzled by this.

You cannot change one team’s seed or a pod location without creating ripple effects that affect many other teams so these decisions are not easy to make. You can see the bracket with pod locations here. I’m sure that every year a team or two gets under/over-seeded because of these constraints. It’s easy to point out teams that are badly seeded but it’s much harder to know how to fix the problems. The tournament scheduling part is so difficult that some (like Cole Smith, Barbara Fraticelli and Chase Rainwater!) have published research papers on optimization models to design good tournaments that balances these constraints while getting the seeds right. Despite this challenge, it seems that swapping Wisconsin with another B1G team such as Minnesota would be feasible, not cause other ripple effects, and would better reflect the teams’ rankings. Scheduling might be hard, but a lot is on the line so the committee should get it right most of the time.

I also may be biased, but in sum, I cannot find a part of the seeding process where Wisconsin being assigned an 8 seed makes sense. I’m disappointed. Getting the seeds right matters because, in this case, an 8 seed for Wisconsin means playing overall top seeded Villanova in the round of 32, assuming that Villanova is not the first 1 seed to be upset in the round of 64. That makes for a rough path to the Final Four. In any case, this is all part of the game. I’m looking forward to the tournament and rooting for the Badgers.

I talked to WKOW in Madison about the seed. You can read about it and watch the video here.

## I’ll be talking about the math behind #MarchMadness on March 14 on the UW-Madison campus

I will be talking about bracketology, March Madness, and the College Football Playoff on Tuesday, March 14 at 7:30 pm in the H.F. Deluca Forum in the Discovery Building on the University of Wisconsin-Madison camps. More information can be found here. I hope to see you there!

## Bracket tips for winning your #MarchMadness pool

Today I have four tips for winning your March Madness bracket pool.

#### 1 Ignore RPI, use math based rankings instead to take strength of schedule into account.

Ken Massey has a rankings clearinghouse here: http://www.masseyratings.com/cb/compare.htm. I’m happy to say that I’m the only women contributor to this list 🙂 My rankings are here.

#### 2 Pay attention to the seeds

The seeds matter because they determine a team’s path to the Final Four. Some seeds generate more upsets than others, such as 7-10 seeds and 5-12 seeds. Historically, 6-11 seeds go the longest before facing a 1 or 2 seed. Teams with an 8 seed face a tough Round-of-64 opponent and have to face a 1 seed next (sorry Badgers).

However, there are plenty of upsets. The Final Four has been composed of all 1 seeds only once. See BracketOdds at Illinois for more information on how the seeds have fared.

Having said that, the committee doesn’t always get it right. There are Some teams like SMU, Wichita State, and Xavier are underseeded and are poised to upset. Also, Villanova is the overall #1 seed and has a 15% chance of winning the entire tournament, which is low, meaning that there isn’t a strong favorite this year.

#### 3 Don’t pick Kansas to win it all

Be strategic. The point is NOT to maximize your points, it’s to get more points than your opponents. I’ve been getting in the habit of picking my Final Four first and filling in the rest later.  You can pick the eventual winner (say, Villanova) and still lose your pool if everyone else picks Villanova. FiveThirtyEight estimates that Villanova has a 15% chance of winning the tournament, meaning that another team is probably going to win.

One way to be strategic is to pick an undervalued top team to win the tournament. For example, last year Kansas was selected as the overall winner in 27% of brackets on ESPN and in 62% of Final Fours) despite having an overall 19% chance of winning (538). On the other hand, UNC was selected as the overall winner in 8% of brackets (with a 15% win probability). Getting UNC right last year helped vault past those who picked Kansas.

#### 4 It’s random

The way brackets are scored means that randomness rules. It’s easy to forget that a good process does not guarantee the best outcome in any give year. A good process yields better outcomes on average but your mileage may vary any given year (at least that what I tell myself when I don’t win my pool!)

Small pools are better if you have a good process. The more people in a pool, the higher chance that someone will accidentally make a good bracket with a bad process. It’s like stormtroopers shooting at a target. They’re terrible, but if they take enough shots they’ll hit the target once.

## Watch me on WISC-TV (CBS) Live at Four on 3/15/2016

I had a wonderful time on Live at Four talking about bracketology. Kudos to Susan Siman and Mark Koehn as well as producer Steve Koehn for making everything so easy. Live news moves so fast! I didn’t articulate a few points very well. I’ll try to explain here:

1. I didn’t say what I meant by an “upset.” Defining an upset is a maybe a super-professorial thing to do, but I can’t help myself. An upset is usually defined as the lower seeded team winning. However, some teams are seeded too low and others are too high as as a result some upsets are not as surprising as the seeds suggest. I like to take a look at the rankings of the two teams and the win probabilities to get a better sense of a game being hard to predict. For example, according to the FiveThirtyEight win probabilities, Arizona (6)/Wichita State (11) game features has a 50% win probability for each team despite a big seed differential. Other games may not be so evenly matched but produce win probabilities between 25%-75%, which means that we would expect at least 1 of 4 of these games to produce upsets, on average. I am not so surprised when this happens. I am more surprised when the win probabilities are less than 10%. There are definitely upsets in the tournament but often they are not rare events. For more, read Nate Silver’s thoughts on upsets here.
2. Regarding the perfect bracket. While I think someone will get the perfect bracket, I don’t think it will happen this year. Or next year. Or in the next decade. Maybe we will see one within the next 100 years. There are different estimates on the probability of getting all the picks right, many are in the range of 1-in-100 billion. So many people fill out brackets–there were 11.5 million brackets on ESPN alone in 2015. With so many attempts to hit the target, I think one will strike…eventually. When it happens, it will happen on one of those years when there are not too many upsets. In 2014 with Final Four seeds 1,2, 7, 8 there were only 612 brackets on ESPN with all Final Four teams picked correctly. But in 2015 with Final Four seeds 1, 1, 1, 7 (the 7 seed was tournament darline MSU), there were 182,709 brackets with all Final Four teams picked correctly. Some years are easier to forecast than others. Given all the brackets in the online pools out there, I think we will eventually see a perfect bracket even if it does not win the official prize. More here.
3. To win your pool, you don’t need a strategy that maximizes your points, you need a strategy that gives you more points than your opponents. Those two strategies are very similar but slightly different. To Here are Ken Massey’s composite rankings I mentioned in the interview for figuring out who the best teams are. And of course, here are my rankings at Badger Bracketology. Check our ESPN’s Who Picked Whom to see which teams may be overvalued and undervalued.

It’s time for me to stop over-analyzing my bracket advice and fill out my bracket. Good luck!

For more:

## Two minute bracketology. My spin on math, bracketology & #MarchMadness

The University Communications group at the University of Wisconsin-Madison asked me to film a short video about bracketology, math (what is a Markov chain?), and what we can learn from bring math into bracketology. It was a wonderful experience, and I’m thrilled with the final product. I can’t take credit — the video was produced by Justin Bomberg and the video and reflects his vision.

The video is just under 3 minutes long. We filmed this before the selections were made, and as you can see, I recommend picking 10 seeds to upset 7 seeds. And of course, Wisconsin ended up seeded 7. Sorry team! Also, I naively agreed to attempt to attempt to spin a basketball on my finger while the film was rolling…I think you’ll easily understand why I’m more comfortable with the math.

I’m looking forward to watching the tournament. Go Badgers!

## tips for filling out your tournament bracket and winning your March Madness pool

Here are a few things I do to fill out my bracket using analytics.

1. Let’s start with what not to do. Although a great record is meaningful, I usually don’t put a whole lot of weight on a team’s record because strength of schedule matters.

I do not like RPI either. RPI is a blend of a team’s winning percentage, its opponents (and their opponents) winning percentages (more here).  It just doesn’t lead to a useful tool for making bracket picks.

2. There are good math models that are helpful for picking a bracket. Use these sophisticated ranking tools. The seeding committee uses some of these ranking tools to select the seeds, so the seeds themselves reflect strength of schedule and implicitly rank teams.  Here are a few ranking tools that use math modeling.

You can use these rankings to pick the better team in your bracket. Oregon, for example, is not in the top 4 in any of these rankings.

3. Survival analysis quantifies how far each each team is likely to make it in the tournament. This doesn’t give you insight into team-to-team matchups per se, but you can think about the probability that Wisconsin or MSU or whoever making it to the Final Four reflecting an kind of average across the different teams a team might play during the tournament.

This is helpful for picking a top down bracket where you pick your Final Four first and then filling in your bracket from there.

4. Look at the seeds. Only once did all four 1-seeds make the Final Four. It’s a tough road. Seeds matter a lot in the rounds of 64 and 32, not so much after that point. There will be upsets. Some seed match ups produce more upsets than others. The 7-10 and 5-12 match ups are usually good to keep an eye on (unfortunately, the Badgers are a 7 seed this year so this means I might be predicting their demise. I hope I’m wrong!).

4. Don’t ignore preseason rankings. The preseason rankings are educated guesses on who the best teams are before any games have been played. It may seem silly to consider preseason rankings at the end of the season after all games have been played (when we have much better information!) but the preseason rankings seem to reflect some of the intangibles that predict success in the tournament (a team’s raw talent or athleticism).

6.Math models are very useful, but they have their limits. Math models implicitly assume that the past is good for predicting the future. This is not usually a good assumption when a team has had any major changes, like injuries or suspensions. You can check out crowdsourcing data (who picked who in a matchup), expert opinion, and things like injury reports to make the final decision.

On the other hand, experts sometimes focus too much on who is “hot” at the moment, thereby discounting the past too much. There is probably a “right” level of discounting, but people (experts included) have a short memory and may discount early data points from early in the season. So while I like experts to supplement my picks, I am also careful.

7. My final decision is to be strategic in picking your Final Four teams with respect to your opponents in your pool. It’s hard to win your bracket if everyone chooses, say, Wisconsin to win it all. Pick a unique team to win the tournament, be the runner up, or be in the Final Four to set your bracket apart. If that team makes it, then you will have a huge advantage in terms of winning your pool. But choose wisely.

This works in moderate sized pools, but no so much in huge pools. If you are in a big pool then your odds of winning with analytics are diluted by someone winning by pure luck (e.g., your friend who won the 2011 pool because they liked VCU’s mascot Rodney the Ram).

For more: