The Packers should have gone for it on 4th and goal

The Green Bay Packers were defeated by the Tampa Bay Buccaneers last night. The Packers trailed 31-23 when it was fourth down and goal with 2:22 to go in the fourth quarter. The Packers decided to kick a field goal instead of trying for a touchdown. The decision was universally criticized. Without crunching the numbers, I knew it would be better to go for it and attempt to get a touchdown, even though either decision was a longshot. The Packers lost 31-26.

Since the game ended, I crunched the numbers.

Here is how I approached the decision. First, the Packers needed a series of events to occur, with all or nearly all events working in their favor to win. Computing the probability of the intersection of multiple events occurring is likely to be a small number. I examined the pathways to winning below. There were some fluke ways to win that I left out because those probabilities were negligible. My calculations are in this spreadsheet.

Decision #1: Go for it on fourth down. There are two ways to win in this scenario.

  1. Score a touchdown.
  2. Make the two point conversion to tie the game.
  3. Stop the Buccaneers defensively (a TB field goal means the Packers lose).
  4. Win by scoring within regulation or in overtime if time expires.

I estimate that the Packers had a probability of 0.6 of scoring a touchdown based on Aaron Rodgers’s pass completion numbers. Teams have a probability of 0.48 of getting the two point conversion. Teams have a probability of 0.68 of stopping their opponent from scoring on a possession. There was not much time on the clock, so this may have been an underestimate. However, both teams had multiple time out to stop the clock, and there had not yet been the two minute warning. Winning in overtime for two evenly matched teams is 50-50. Winning within regulation with very little time left has a small probability (say, 0.03). Putting this together, I estimate that the Packers had a win probability of 0.104.

Decision #2: Make a field goal attempt. There are also two ways to win in this scenario:

  1. Make the field goal.
  2. Stop the Buccaneers defensively while leaving enough time on the clock to score.
  3. Win by scoring a touchdown within regulation.

or

  1. Miss the field goal.
  2. Stop the Buccaneers defensively while leaving enough time on the clock to score.
  3. Score a touchdown within regulation, make the two point conversion to tie, and win in overtime (see Decision #1).

I estimate that the Packers had a probability of 0.96 of scoring a field goal. Teams normally have a probability of 0.68 of stopping their opponent from scoring, but I lowered that to 0.5 here because it needed to happen in such a way that the Packers had enough time for one last drive. That is likely an optimistic estimate. I estimate that the Packers could score a touchdown with a probability of 0.15 with the remaining time (Rodgers had an MVP worthy season). The second way to win involved missing the field goal and tying the game in regulation with a last second touchdown and later winning in overtime. Putting this together, I estimate that the Packers had a probability of 0.076. I believe this is optimistic.

Takeaways

  1. Going for a touchdown increasing the win probability by about 3% compared to kicking a field goal. It’s not a huge different, but it’s also not insignificant.
  2. Either way, the Packers were unlikely to win. So while the decision was bad, it wasn’t a decision that likely cost the Packers the game.
  3. Kicking the field goal (Decision #2) could make sense with high confidence in a defensive stop or scoring a TD with time expiring. For the best defensive team in the NFL, decision #2 might be the better option. If Tampa Bay had, say, the worst defense in the country, especially if their secondary was weak, Decision #2 would be more attractive.
  4. The Packers had two bad choices.

Reflections on 2020 and New Year’s resolutions for 2021

A new year begins tomorrow. I’m taking the opportunity to reflect upon the past year. 2020 was a historic and terrible year in many ways. The COVID-19 pandemic changed life as we know it and demanded many sacrifices. I lost my sabbatical (read my sabbatical posts here).

But 2020 was not entirely a bad year. I took on new hobbies, habits, and challenges. As 2020 comes to an end, I reflected upon what I was able to achieve in 2020.

  • I started new research related to the pandemic and critical infrastructure resilience. It has been a creative year.
  • I did more media outreach to improve public understanding of risk management.
  • I wrote my first op-ed. Actually, I wrote four.
  • I was selected as a IISE Fellow and a AAAS Fellow.
  • I learned about best practices for inclusive teaching in online environments and updated my teaching materials and improved my pedagogy. I am a better teacher now than I was a year ago.
  • I developed a new routine at home that helped my productivity.
  • Virtual K12 school at home is not easy for my three kids, but they are doing about as well as anyone can.
  • I started new hobbies, including jigsaw puzzles and tennis. I even went to the driving range and (sort of) golfed for the first time.
  • I expanded my vegetable garden and was able to grow a lot more than in the past.
  • I love being able to cook and bake. Working from home means I can knead bread dough between meetings and cook elaborate and healthy dinners. I have been eating very well.
  • Extra quality time with my family has been wonderful.
  • I have been able to appreciate the small things all year long.

New Year’s resolutions in 2021

  1. Less doom scrolling.
  2. Create more, consume less.
  3. Continue high levels of public outreach through media appearances and public lectures.
  4. Fewer zoom meetings. I often did not meet my goal of 4 hours or or less of meetings in 2020.
  5. Replace one-on-one zoom meetings with phone calls, where I can go on a walk and stretch my legs during the call.
  6. Write and edit my writing every day, even if only for a few minutes.
  7. Become a better vegetable gardener. I’m good at growing tomatoes and herbs. I want to learn how to grow more vegetables, including the cool weather vegetables like greens and root vegetables.
  8. Go on vacation.

For more reading, check out my New Year’s resolutions in 2018 and 2019. Dijkstra’s 10 commandments of academic research also serve as potential New Year’s resolutions.


How to use the title “Dr.” in academia: possible best practices

I was upset to read the Wall Street Journal op/ed entitled, “Is There a Doctor in the White House? Not if You Need an M.D. Jill Biden should think about dropping the honorific, which feels fraudulent, even comic.” The op/ed was upsetting, because it suggested that anyone who has earned a degree that comes with the title of “Dr., such as those with a PhD or Ed.D., should not use their titles for degrees they earned.

This is concerning because research has shown that women doctors are less likely to be called by their titles then men, almost half of Black and Latina professors report having been mistaken for janitorial staff, and women and BIPOC professors routinely have their credentials ignored. Women over-invest in credentials, in part because research has shown that women need more credentials than men to be considered for awards and promotions.

The problem is not with Dr. Biden, it is with the cultural construct of expertise, who is presumed to have it, and who is given permission to wield the terms of power that signify it. In dominant culture, the construct of “expert” is based on false hierarchies – crafted to exclude the vast majority of the world’s knowledge (including the expertise of women and people of color).

Katie Orenstein from the twitter thread below about the WSJ article.

Mis-titling and de-titling professors is an equity issue. I gave some thought as to how to address this issue. I have a few suggestions below that are based on my experiences.

Here is some background. I used to ask students in my research group to use my title and last name. Students in other research groups often called me by my first name without my permission, and I found it strange that they addressed me in a casual way even after hearing the students in my research group address me in a formal way. There seemed to be two causes. (1) Students on a first name basis with their advisors and possibly other professors incorrectly assume that all professors let students call them by their first name. (2) Other professors, with whom I am on a first name basis, refer to me using only my first name in front of other students, which gives the students “permission” to call me by my first name. But I did not given permission. The students’ advisors in these situations have almost entirely been male, which possibly reflects societal constructs of power. Men inadvertently signal to students when it is acceptable to de-title and mis-title others, and these signals carry a lot of weight, especially if the person in question is a woman and/or is BIPOC. It seems that is was worth explicitly addressing these two mechanisms to reduce the chances that other professors are not mis-titled or de-titled.

I now ask students in my research group to call me by my first name. I wanted to make sure that all students knew what to call me while also not de-titling other professors, since new students have joined my group. In this conversation, I was surprised that not everyone knew about this rule, so I was glad we revisited this so I could make corrections and make sure that no one feels singled out.

I discussed the article with the students in my lab and this is what I suggest.

  1. On a regular basis, remind all students how you would like to be addressed in a group meeting , such as when new students join the lab. This can also be included in a lab compact.
  2. Use professors’ titles (Professor or Dr.) in informal settings unless they say otherwise. If they have given you permission to call them by their first name, it is still appropriate to sometimes use their titles, such as when there are other professors or students in a conversation.
  3. Use professors’ titles in formal settings, such as when introducing a speaker or in a committee meeting.
  4. When in doubt, ask someone what they want to be called.

What else is missing from this list?

In full disclosure, I have not always followed these rules in practice, and I will make a conscious effort to do better. I am a work in progress. I try to learn and make adjustments on a regular basis for continuous improvement.

For more reading, read my post about changing my name:


PhD development seminar: Time management and work-life balance

I am teaching a PhD development seminar for first year PhD students in industrial engineering and related disciplines. The purpose of this course is to prepare students for the dissertation research in industrial and systems engineering. The course helps set expectations, introduces campus resources to students, and creates a cohort of student to connect students with their peers.

Last week, a student panel composed of three senior PhD students discussed time management and work-life balance. The panelists were fantastic. Below are some highlights from the panel.

I am creating a series of blog posts featuring some of the classes from the semester. Those, along with previous PhD related posts, are tagged with the “PhD support” tag.

Other posts in this series:


Time management and work-life balance for (new) academics

I was on a panel about time management for the 2020 INFORMS New Faculty Colloquium (NFC). I recorded a video sharing my tips for time management with assistant professors in mind. I posted my video on YouTube below.

The live Q&A was fantastic, and I learned a lot from my fellow panelists Professors Tom Sharkey and Jonathan Helm. I want to give a big thank you to Professor Siqian Shen, who organized the NFC.


Presidential election forecasting: a case study

I am sharing several of the case studies I developed for my courses. This example is a spreadsheet model that forecasts outcomes of an election using data from the 2012 Presidential election.

Presidential Election Forecasting

There are a number of mathematical models for predicting who will win the Presidential Election. Many popular forecasting models use simulation to forecast the state-level outcomes based on state polls. The most sophisticated models (like 538) incorporate phenomena such as poll biases, economic data, and momentum. However, even the most sophisticated models are often modeled using spreadsheets.

For this case study, we will look at state-level poll data from the 2012 Presidential election when Barack Obama ran against Mitt Romney. The spreadsheet contains realistic polling numbers from before the election. Simulation is a useful tool for translating the uncertainty in the polls to potential election outcomes.  There are 538 electoral votes: whoever gets 270 or more votes wins.

Assumptions:

  1. Everyone votes for one of two candidates (i.e., no third party candidates – every vote that is not for Obama is for Romney).
  2. The proportion of votes that go to a candidate is normally distributed according to a known mean and standard deviation in every state. We will track Obama’s proportion of the votes since he was the incumbent in 2012.
  3. Whoever gets more than 50% of the votes in a state wins all of the state’s electoral votes. [Note: most but not all states do this].
  4. The votes cast in each state are independent, i.e., the outcome in one state does not affect the outcomes in another.

It is well known that the polls are biased, and that these biases are correlated. This means that there is dependence between state outcomes (lifting assumption #4 above). Let’s assume four of the key swing states have polling bias (Florida, Pennsylvania, Virginia, Wisconsin). A bias here means that the poll average for Obama is too high. Let’s consider biases of 0%, 0.5%, 1%, 1.5%, and 2%. For example, the mean fraction of votes for Obama in Wisconsin is 52%. This mean would change to 50% – 52% depending on the amount of bias.

Using the spreadsheet, simulate the proportion of votes in each state that are for Obama for these 5 scenarios. Run 200 iterations for each simulation. For each iteration, determine the number of electoral votes in each state that go to Obama and Romney and who won.

Outputs:

  1. The total number of electoral votes for Obama
  2. An indicator variable to capture whether Obama won the election.

Tasks:

(1) Create a figure showing the distribution of the total number of electoral votes that go to Obama. Report the probability that he gets 270 or more electoral votes.

(2) Paste the model outputs (the electoral vote average, min, max) and the probability that Obama wins for each of the five bias scenarios.

(3) What is the probability of a tie (exactly 269 votes)? 

Modeling questions to think about:

  1. Obama took 332 electoral votes compared to Romney’s 206. Do you think that this outcome was well-characterized in the model or was it an expected outcome?
  2. Look at the frequency plot of the number of electoral votes for Obama (choose any of the simulations). Why do some electoral vote totals like 307, 313, and 332 occur more frequently than the others?
  3. Why do you think a small bias in 4 states would disproportionately affect the election outcomes?
  4. How do you think the simplifying assumptions affected the model outputs?
  5. No model is perfect, but an imperfect model can still be useful. Do you think this simulation model was useful?

More reading from Punk Rock Operations Research:

How FiveThirtyEight’s forecasting model works: https://fivethirtyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-different-because-of-covid-19/

Files

  1. The assignment
  2. A shell spreadsheet with basic data to share with students
  3. A spreadsheet with the solutions

More teaching case studies


SIR models: A teaching case study to use in a course about probability models

This past summer, I created a few examples about COVID-19 to use in my course on probability models. I’ll post those materials here as I teach with them. Here is the first case study that introduces SIR models for modeling the spread of infectious disease. SIR models are widely used in epidemiology.

Infectious disease modeling: framing and modeling

Assume we have a constant population with N individuals. We can partition the population into three groups:

  1. Those who are susceptible to disease (S[n], i.e., not infected).
  2. Those who are infected (I[n])
  3. Those who are recovered (R[n]).

We assume a discrete time model, where we are interested in how the number of susceptible, infected, and recovered individuals vary according to time. Therefore, we start at time n=0 and index these values by n. The time between time n and n+1 could represent, say, a week.

A new strain of influenza or a novel coronavirus emerges. Susceptible individuals can become infected after exposure, and infected individuals can recover. Recovered individuals have immunity from reinfection.

New infecteds, result from contact between the susceptibles, and infecteds, with contact rate beta/N, which represents the proportion of contacts an infected individual has. Infecteds are cured at a rate (gamma) proportional to the number of infecteds, which become recovered.

Question #1: Come up with an expression to relate N to S[n], I[n], and R[n].

Question #2: Develop recursive expressions for S[n+1] based on S[n] and perhaps other variables.

Question #3: Then, do the same for I[n+1] and R[n+1].

Question #4: What are the boundary conditions?

Question #5: How would you estimate the total number who become infected by time n? 

Discussion questions:

  1. What other diseases fit this model?
  2. What are some possible ways to reduce the infection rate?
  3. What are some possible ways to increase the recovery rate?
  4. How does a vaccine effect this model?
  5. There is an interruption in the production of the vaccine, and your state will only receive 20% of the vaccines that you need before influenza season begins. Vaccines will slowly be released after this level. What are some criteria we could use to decide how to distribute these vaccines? What else can you do?

The second part performs computation in a spreadsheet. The assignment is here. We use the CDC 2004-5 data from a population of 157,759 samples taken from individuals with flu-like symptoms and 3 initial infections. Let n=0 represent the last week in September, the beginning of influenza season. Then, we compute these numbers in a spreadsheet to see how the disease may evolve. Next, we fit the model parameters (beta and gamma) using data that was collected by minimizing the sum squared error (SSE). Finally, we assess the impact of a vaccine. 

Files:

  1. The assignment.
  2. The solution.
  3. The assignment for the computational part.
  4. A google spreadsheet with the calculations (create a copy or download)

More examples


PhD development seminar: getting started with research

I am teaching a PhD development seminar for first year PhD students in industrial engineering and related disciplines. The purpose of this course is to prepare students for the dissertation research in industrial and systems engineering. The course helps set expectations, introduces campus resources to students, and creates a cohort of student to connect students with their peers.

I am creating a series of blog posts featuring some of the classes from the semester. Those, along with previous PhD related posts, are tagged with the “PhD support” tag.

Other posts in this series:


PhD development seminar: first steps in writing

I am teaching a PhD development seminar for first year PhD students in industrial engineering and related disciplines. The purpose of this course is to prepare students for the dissertation research in industrial and systems engineering. The course helps set expectations, introduces campus resources to students, and creates a cohort of student to connect students with their peers.






I am creating a series of blog posts featuring some of the classes from the semester. Those, along with previous PhD related posts, are tagged with the “PhD support” tag.

Other posts in this series:


Pooled testing: a teaching case study to use in a course about probability models

This summer, I created a few examples about COVID-19 to use in my course on probability models. I’ll post those materials here as I teach with them. Here is the first example.

Pooled testing to expand testing capacity

In July 2020, many states struggled to process COVID-19 tests quickly, with some states taking more than a week to process tests. Many statisticians have proposed pooled testing to process tests quicker and effectively expand testing capacity to up to four times the regular capacity. Pooled testing works when few tests come back positive.

Pooled testing came about in the 1940s, when government statisticians needed a more efficient way to screen World War II draftees for syphilis. “The Detection of Defective Members of Large Populations,” by R. Dorfman in 1943 contains a methodology for pooled testing.

Pooled testing works as follows:

  • Tests are grouped that pool n samples together, where each sample reflects an individual’s test sample.
  • Pooled test results are either positive or negative. They come back positive if at least 1 of the n individual samples are positive.
  • For tests that come back positive, tests are rerun individually with the unused portions of the original samples to see which individuals test positive, achieving the same results but faster. A total of n+1 tests are performed.
  • For tests that come back negative, no further testing is needed. We conclude all individuals are negative. One total test is performed, which reduces the overall tests.
  • When pooling is not used, one test per individual yields n tests for the group.

Consider a group of 40 asymptomatic individuals that are tested for COVID-19 in pooled groups of size . Let  denote the number of groups tested, and let  capture the number of groups that test positive (a random variable). We assume that an individual tests positive for COVID-19 with probability  (New York data from July 2020).

  • Express g as a function of n.
  • Express X and its distribution based on g, n, and q.
  • Let the random variable T denote the total number of tests run. Derive an expressive for T as a function of  as well as fixed parameters n and g.
  • Consider test groups of size n = 4, 5, 8, 10, 20. Which group size yields the fewest number of tests performed, on average? (Hint: Find E[T]).
  • How does your answer to the last question change if q = 0.02, 0.02, 0.075? (Note: Dane County had q = 0.02 and Wisconsin had q = 0.075 at the end of July 2020. At the time I wrote this in early October 2020, more than 20% of COVID tests are coming back positive in Wisconsin).

You can read more on the New York Times article that inspired this case study.

Files:

  1. The assignment.
  2. The solution.
  3. A google spreadsheet with the calculations (create a copy or download)