I have long been interested in football analytics, and I enjoy crunching numbers while watching the games. This year is the first season for the NCAA football playoff, where four teams will play to determine the National Champion. It’s a small bracket, but it’s a start in the right direction.
The first step to being becoming the national champion is to make the playoff. To do so, a team must be one of the top four ranked teams at the end of the season. A selection committee manually ranks the teams, and they are given a slew of information and other rankings to make their decisions.
I wanted to see if I could forecast the playoff ahead of time by simulating the rest of the season rather than waiting until all season’s games have been played. Plus, it’s a fun project that I can share with my undergraduate simulation simulation that I teach in the spring.
Here is how my simulation model works. The most critical part is the ranking method, which uses the completed game results to rate and then rank the teams so that I can forecast who the top 4 teams will be at the end of the season. I need to do this solely using math (no humans in the loop!) in each of 10,000 replications. Here is how it works. I start with the outcomes of the games played so far, starting with at least 8 weeks of data. This is used to come up with a rating for each team that I then rank. The ranking methodology uses a connectivity matrix based on Google’s PageRank algorithm (similar to a Markov chain). So far, I’ve considered three variants of this model that take various bits of information account like who a team beats, who it loses to, and the additional value provided by home wins. I used data from the 2012 and 2013 seasons to tune the parameters needed for the models.
The ratings along with the impact of home field advantage are then used to determine a win probability for each game. From previous years, we found that the home team won 56.9% of games later in the season (week 9 or later), which accounts for an extra boost in win probability of ~6.9% for home teams. This is important since there are home/away games as well as games on neutral sites, and we need to take this into account. The simulation selects winners in the next week of games by essentially flipping a biased coin with. Then, the teams are re-ranked after each week of simulated game outcomes. This is repeated until we get to the end of the season. Finally, I identify and simulate the conference championship games played (these are the only games not scheduled in advance). And then we end up with a final ranking. Go here for more details.
There are many methods for predicting the outcome of a game in advance. Most of the sophisticated methods use additional information that we could not expect to obtain weeks ahead of time (like the point spread, point outcomes, yards allowed, etc.). Additionally, some of the methods simply return win probabilities and cannot be used to identify the top four teams at the end of the season. My method is simple, but it gives us everything we need without being so complex that I would be suspicious of overfitting. The college football season is pretty short, so our matrix is really sparse. At present, teams have played 8 weeks of football in sum, but many teams have played just 6-7 games. Additional information could be used to help make better predictions, and I hope to further refine and improve the model in coming years. Suggestions for improving the model will be well-received.
Our results for our first week of predictions are here. Check back each week for more predictions.
Our twitter handle is: @badgerbrackets
Your thoughts and feedback are welcome!
- Markov chains for ranking sports teams
- Will the NYT Fourth Down Bot change how football is played?
- Before Sabermetrics, there was football analytics
- Football decision-making on fourth down