Tag Archives: sports

Sports scheduling meets business analytics: why scheduling Major League Baseball is really hard

Mike Trick of Carnegie Mellon University came to the Industrial and Systems Engineering department at UW-Madison to give a colloquium entitled “Sports scheduling meets business analytics.”

How hard is it to schedule 162 game seasons for the 30 MLB teams? It’s really, really hard.

Mike Trick stepped up through what makes for a “good” schedule? Schedules obey many constraints, some of which include:

  • Half of each team’s games are home, half are away.
  • Teams cannot have more than three series away or home.
  • Teams cannot have three home weekends in a row.
  • Teams in the same division play six series: two early on, two in the middle of the season, and two late, with one home and one away each time.
  • Teams play all other teams in at least two series.
  • Schedules should have a good flow, with about one week home followed by one week away.
  • Teams that fly from the west coast to the east coast have a day off in between series.

Teams can make additional scheduling requests. Every team, for example, asks for a home game on Father’s Day, and this can only be achieved for half of the teams in any given year. Mike addresses this by ensuring that no team has more than two away games in a row on Father’s Day.

Mike illustrated how hard it was to create a feasible solution from scratch. You cannot complete a feasible schedule if you try something intuitive like schedule the weekends first and fill out the rest of the schedule later. This leads to infeasible schedules 99% of the time. One of the challenges is that integer programming algorithms do not quickly identify when infeasibility is reached and instead branch and bound for a long while.

Additionally, it is equally hard to change a small piece of a feasible schedule based on a new requirement and easily get another feasible schedule. For example, let’s say the pope decides to visit the United States and wants to use the baseball stadium on a day scheduled for a game. You cannot simply swap that game out with another. Changing the schedule to free up the stadium on that one day leads to a ripple of changes across the entire schedule for the other teams, because changing that one game affects the other visiting team’s schedule and leads to violations in the above constraints (e.g., half of each team’s games are at home, etc). This led to Mike’s development of a large neighborhood search algorithm that efficiently reschedules large parts of the schedule (say, a month) during the schedule generation process.

Mike found that how he structured his integer programming models made a big difference. He did not use the standard approach to defining variables. Instead he used an idea from Branch and Price and embedded more structure in the variables (which ultimately introduced many more variables) to solve the problem more efficiently using commercial integer programming solvers. This led to 6 million variables that allowed him to embed his objectives such as travel costs.

In most real-world problems, Mike noted that there is no natural objective function. MLB schedules are a function of travel distance and “flow,” where flow reflects the goal of alternating home and away weeks. The objective reflects the distance teams travel. He cannot require each team to travel the same amount. Seattle travels a minimum of 48,000 miles per season no matter the schedule because Seattle is far away from most cities. Requiring other teams to travel 48,000 miles in the season leads to schedules where teams often travel from coast to coast on adjacent series to equal Seattle’s distance traveled. That is bad.

Mike ultimately included revenue in his objective, where revenue reflects attendance. He used linear regression to model attendance. He acknowledged that this is a weakness, because attendance does not equal profit. For example, teams can sell out afternoon games when they discount ticket prices. Children come and do not purchase beer at the stadiums, which ultimately fills the stands but does not generate the most revenue.

Mike summarized the keys to his success, which included:

  1. Computing power improved over time
  2. Commercial solvers improved
  3. He solved the right problem
  4. He structured the problem in an effective way
  5. He identified a way to get quick solutions for part of the schedule (useful for when something came up and a game had to change).
  6. He developed a large neighborhood search algorithm that efficiently retools large parts of the schedule.

Three years ago I wrote a blog post about Mike Trick’s keynote talk on Major League Baseball (MLB) scheduling at the German Operations Research Conference (blog post here). that post contains some background information.

 

Advertisements

Ranking the B1G

I post weekly NCAA men’s basketball rankings over at Badger Bracketology. Every week I also post the rankings of the Big Ten conference teams. Here are the rankings right now. They differ from the rankings on the site because I made a small change to how I score games in overtime–I now count those as having been decided by a single point to capture the closeness of the games. Here are where the B1G teams end up in the overall rankings:

5 Michigan St
11 Indiana
14 Purdue
19 Maryland
21 Iowa
32 Wisconsin
39 Michigan
52 Ohio St
63 Northwestern
90 Nebraska
93 Illinois
102 Penn St
167 Minnesota
210 Rutgers


These rankings reflect all games across the season without discounting games earlier in the season.

The committee will look at how the teams did down the stretch, and I wanted to get a sense for how I would rank teams based on only the conference games played in the second half of the season. This doesn’t discount the games and it completely ignores the early, non-conference schedule, but it gives a sense as to how the teams should be ranked based on the quality of the wins and losses in the conference schedule.

The results are below using my Modified Logistic Regression Markov Chain method. I only rank the 14 B1G teams because I am only considering B1G games. My B1G rankings are really close to the official ranking based on the standings (in parentheses). I am able to rank order the four teams tied for third place.

Ranking just B1G conference games (official ranking based on the standings in parentheses):

  1. Indiana (1)
  2. Michigan St (2)
  3. Iowa (3)
  4. Maryland (3)
  5. Wisconsin (3)
  6. Purdue (3)
  7. Michigan (8)
  8. Ohio St (7)
  9. Northwestern (9)
  10. Nebraska (11)
  11. Penn St (10)
  12. Illinois (12)
  13. Minnesota (13)
  14. Rutgers (14)

We can see a few differences between the win/loss (standings) rankings and my rankings. While Michigan State is the top ranked team in the B1G when considering all games, Michigan State finished second when considering just the conference games. Wisconsin, for example, played poorly earlier in the season and finished 5th when considering only conference games.

I’ll post my final B1G conference rankings after the B1G conference tournament.

Check out Badger Bracketology for more updates.

 


sports analytics featured in the latest INFORMS Editor’s Cut

An Editor’s Cut on Sports Analytics edited by Scott Nestler and Anne Robinson is available. The volume is a collection of sports analytics articles published in INFORMS journals. Some of the articles are free to download for a limited time if you don’t have a subscription. But there is more than academic papers in the Editor’s Cut.

Here are some of my favorite articles from the volume.

Technical Note—Operations Research on Football [pdf] by Virgil Carter and Robert E. Machol, 1971. This is my favorite. This article may be the first sports analytics paper ever and it was written in an operations research journal (w00t!). It’s written by an NFL player who used data to estimate the “value” of field position and down by watching games on film and jotting down statistics. For example, first and 10 on your opponent’s 15 yard line is worth 4.572 expected points, whereas first and 10 on your 15 yard line is worth -0.673 expected points. This idea is used widely in sports analytics and by ESPN’s Analytics team to figure out things like win probabilities. This paper was way ahead of its time. You can listen to a podcast with Virgil Carter here (it’s my favorite sports analytics podcast).

An Analysis of a Strategic Decision in the Sport of Curling by Keith A. Willoughby and Kent J. Kostuk, 2005. This is a neat paper. I have never curled but can appreciate the strategy selection at the end of a game. In curling, the choice is between taking a single point or blanking an end in the latter stages of a game. Willoughby and Kostuk use decision trees to evaluate the benefits and drawbacks associated with each strategy. Their conclusion is that blanking the end is the better alternative. However, North American curlers make the optimal strategy choice whereas European curlers often choose the single point.

Scheduling Major League Baseball Umpires and the Traveling Umpire Problem by Michael A. Trick, Hakan Yildiz, Tallys Yunes, 2011. This paper develops a new network optimization model for scheduling Major League Baseball umpires .The goal is to minimize the umpire travel of the umpires, but league rules are at odds with this. Rules require each umpire to umpire for all the teams but not two series in a row. As a result, umpires typically travel more than 35,000 miles per season without having a “home base” during the season. The work here helps meet the league goals while making life better for the crew.

A Markov Chain Approach to Baseball by Bruce Bukiet, Elliotte Rusty Harold, José Luis Palacios, 1997. This paper develops and fits a Markov Chain to baseball (You had me at Markov chains!). The model is then used to do a number of different things such as optimize the lineup and forecast run distributions. They find that the optimal position for the “slugger” is not to bat fourth and for the pitcher to not bat last, despite most teams making these decisions.

The Loser’s Curse: Decision Making and Market Efficiency in the National Football League Draft by Cade Massey, Richard H. Thaler, 2013.  Do National League Football teams overvalue the top players picked early in the draft? The answer: Yes, by a wide margin.

There are a couple of dozen papers that examine topics such as decision-making within a game, recruitment and retention issues (e.g., draft preparation), bias in refereeing, and the identification of top players and their contributions. Check it out.

~~~

The Editor’s Cut isn’t just a collection of articles. There are videos, podcasts, and industry articles. A podcast with Sheldon Jacobson is included in the collection. In it, Sheldon talks about bracketology, March Madness, and the quest for the perfect bracket:

A TED talk by Rajiv Maheswaran on YouTube is included in the collection (below) called “The Math Behind Basketball’s Wildest Moves.” It’s a description of how to use analytics to recognize what is happening on a basketball court at any given time using machine learning (is that a pick and roll or not?)

Other sports tidbits from around the web:

Read the previous INFORMS Editor’s Cut on healthcare analytics.

Here are a few football analytics posts on Punk Rock OR:

Who do you think will win the Superbowl? The Carolina Panthers or the Denver Broncos? Did you make this decision based on analytics?


Punk Rock OR was on the Advanced Football Analytics podcast

I am thrilled to have been a guest on the Advanced Football Analytics podcast hosted by Dave Collins (@DaveKCollins ) to talk about Badger Bracketology and football analytics.

Listen here [iTunes link].

Related blog posts:

You can also read some of Badger Bracketology’s press coverage here.


Should a football team run or pass? A game theory and linear programming approach

Last week I visited Oberlin College to deliver the Fuzzy Vance Lecture in Mathematics (see post here). In addition, I gave two lectures to Bob Bosch’s undergraduate optimization course. My post about my lecture on ambulance location models is here.

My second lecture was about how to solve two player zero-sum games using linear programming. The application was a sports analytics application of whether a football team should run or pass. The purpose of the lecture was to learn about zero-sum games (it was a new topic to most students) and learn how to solve zero-sum games with two decision-makers using linear programming.

This lecture tied into my Badger Bracketology work, but since I do not use optimization in my college football playoff forecasting model, I selected another football application.

 

Related reading:


the NFL football draft and the knapsack problem

In this week’s Advanced Football Analytics podcast, Brian Burke talked about the knapsack problem and the NFL draft [Link]. I enjoyed it. Brian has a blog post explaining the concept of the knapsack problem as it relates to the NFL draft here here. The idea is that the draft is a capital budgeting problem for each team, where the team’s salary cap space is the knapsack budget, the potential players are the items, the players’ salaries against the cap are the item weights, and the players’ values (hard to estimate!) are the item rewards. Additional constraints are needed to ensure that all the positions are covered, otherwise the optimal solution returned might be a team with only quarterbacks and running backs. Brian talks a bit about analytics and estimating value. I’ll let you listen to the podcast to get to all the details.

During the podcast, Brian gave OR a shout out and added a side note about how knapsack problems are useful for a bunch of real applications and can be very difficult to solve in the real world (thanks!). I appreciated this aside, since sometimes cute applications of OR on small problem instances give the impression that our tools are trivial and silly. The reality is that optimization algorithms are incredibly powerful and have allowed us to solve incredibly difficult optimization problems.

Optimization has gotten sub-optimal coverage in the press lately. My Wisconsin colleagues Michael Ferris and Stephen Wright wrote a defense of optimization in response to an obnoxious anti-optimization article in the New York Times Magazine (“A sucker is optimized every minute.” Really?). Bill CookNathan Brixius, and JF Puget wrote nice blog posts in response to coverage of a TSP road trip application that failed to touch on the bigger picture (TSP is useful for routing and gene sequencing, not just planning imaginary road trips!!). I didn’t write my own defense of optimization since Bill, Nathan, and JF did such a good job, but needless to say, I am with them (and with optimization) all the way. It’s frustrating when our field misses opportunities to market what we do.

If you enjoy podcasts, football, and analytics, I recommend the Advanced Football Analytics podcast that featured Virgil Carter, who published his groundbreaking football analytics research in Operations Research [Link].

Related posts:

 


Some thoughts on the College Football Playoff

After a fun year of Badger Bracketology, I wanted to reflect upon the college football playoff.

Nate Silver reflects upon the playoff in an article on FiveThirtyEight, and he touches on the two most salient issues in the playoff:

  • False negatives: leaving teams with a credible case for being named the national championship out of the playoff.
  • False positives: “undeserving” teams in the playoff.

As the number of teams in the playoff increases, the number of false negatives decreases (good – this allows us to have a chance of selecting the “right” national champion) and the number of false positives increases (bad).

One of my concerns with the old Bowl Championship Series (BCS) system with a single national championship game was that exactly two teams were invited to the national championship game. This was a critical assumption in the old system that was rarely discussed. There was rarely exactly two teams that are “deserving.” Usually, deserving is equated with “undefeated” and in a major conference. Out of 16 BCS tournaments, this situation occurred only four times (25% of championship games), leading to controversy in the remaining 75%. This is not a good batting average, with most of the 12 controversial years having too many false negatives and no false positives.

The new College Football Playoff (CFP) system has a new assumption: the number of “deserving” teams does not exceed four teams.

If you look at the BCS years, we see that this assumption was never violated: there was never more than four undefeated teams in a major conference nor a controversy surrounding more than 3 potential “deserving” teams. Controversy surrounded the third team that was left out, a team that would now be invited to the playoff. At face value, the four team playoff seems about right.

But given the title of Nate Silver’s article (“Expand The College Football Playoff”) and the excited discussion of the idea of the eight team playoff in 2008 after a controversial national championship game, I can safely say that most people want more than four teams in the playoff. TCU’s dominance in a bowl game supports these arguments. The fact that we’ve had one controversial seeding in one CFP is a sign that maybe four isn’t the right playoff size. What is the upper bound on the number of deserving teams?

Answering this question is tricky, because there is a relationship between the number of teams in the playoff and our definition of “deserving.” There will always be teams on the bubble, but as the playoff becomes larger, this becomes less of an issue. Thoughts on this topic are welcome in blog comments.

It’s worth mentioning the impact on academics and injuries. As a professor of operations research, I believe that every decision requires balancing different tradeoffs. The tradeoffs in the college football playoffs should not only be about false positives, false negatives, fan enjoyment, and ad revenue. Maybe this is trivial: it’s an extra game for a mere eight teams, but I will be disappointed if the entire impact on the student-athletes and their families such as academics and injuries are not part of the conversation.