Tag Archives: probability

bus accidents are a Poisson process

The fourth school bus accident in the Richmond, Virginia area occurred this morning. Everyone wants to know, what does this mean?!?

Here’s what I think it means: bus accidents can be modeled as a Poisson process. Equivalently, the time between bus accidents can be modeled using the exponential distribution. This modeling paradigm is appropriate if bus accidents “randomly” occur independently of one another, which is a reasonable assumption.

If the time between bus accidents is exponentially distributed, then we expect that sometimes bus accidents occur in groups of three or four. Example exponential probability distributions are below. The exponential distribution has parameter lambda, where the average time between arrivals (bus accidents in this case). Most of the “meat” of the distribution is close to zero, even if the average time between arrivals is very large. This means that we would expect to sometimes observe small interarrival times and then go a long time between the next arrival.

Let’s put this in terms of bus accidents. If bus accidents occur as a result of chance or coincidence, then we would sometimes expect to observe four bus accidents in a week and then go months before the next bus accident. Four bus accidents in a week does not necessarily imply that something nefarious is going on.

This reasoning can also be used to explain why completely unrelated celebrity deaths sometimes occur in threes.

Example exponential distributions (probability density functions). The average time between arrivals is lambda^-1.

How rare are four bus accidents in a week? Let’s assume that bus accidents occur once every four weeks on average (lambda=1/4). The probability of observing 4+ accidents in a week is 0.01%. Pretty rare. But that’s any one week. The school year is 36 weeks long, which means that we would have 36 chances to have 4+ accidents in a week. Using the Binomial distribution, we find that the the odds of having at least one week with 4+ accidents is 0.5% (once every 200 years).

What about a slightly less extreme week? The probability of observing 3+ accidents in a week is 0.2%. Over the course of a year, the odds of having at least one week with 3+ accidents is 7.5% (once every 13 years).

Related post:

the license plate game: the raw numbers

My last post discussed how one might estimate how many state license plates one would expect to see on a road trip. I made a spreadsheet to compute the probability of seeing each state license plate.

Assumptions

1. The probability of seeing a state license plate A in another state B depends on the distance between their state capitals. It is scaled by the  number of licensed drivers in state A. (This indirectly means that the probability does not depend on how long we are in a state).
2. Seeing state license plates A, B, etc. are independent from other license plates in a given state D.
3. Seeing given state license plate A is independent when driving across states B, C,…
4. We do not adjust for round trips.

The distance between state capitals was found here. The number of licensed drivers per state is here. I estimated the odds of seeing a license plate from state A in state B is captured by this formula:

P = exp(-K * (Distance from A to B in miles) / # of licensed drivers)

with K = 7000 – 2000*Summer01 – 1000*ExpensiveGas01. Summer01 is 1 if it is summer break and 0 otherwise. ExpensiveGas01 is 1 if it gas is “expensive” and AAA predicts that road trips will be down and 0 otherwise. I didn’t have time to properly identify a meaningful formula or calibrate the parameters. Suggestions here are welcome!

Validation

• We predicted 28.3 states for our summer trip from Richmond to Chicago. We saw ~35. Here, the discrepancy seemed to be the amount of time we spent in each state. We went through fewer states, but was in each state (especially Kentucky and Indiana) a relatively long time.
• We predicted 26.8 license plates for our winter trip from Richmond to Vermont. We saw 26. Not bad!

The results make me conclude that the first assumption is probably not true: the probabilities do depend on how long we are in a state. When driving to Vermont, we went through many (8) little states. When driving to Chicago, we went through fewer (5) states but were in each state for longer.  Moreover, many of the Midwest states are not “destination” states. Take Indiana for instance. I love Hoosiers as much as the next person, but Indiana truly is the “Crossroads of America”–it’s a state that many people from other states drive through. It’s a better place to spot license plates than, say, Delaware. I didn’t take that into account.

Below is a detailed review of our winter trip numbers. It indicates the predicted probability of seeing each state license plate and whether we actually saw it. As asterisk (*) indicates whether the model is “off”–whether we (1) did not see a state with probability greater than 0.5 or (2) did not see a state with a probability of 0.5 or lower.

A copy of my spreadsheet is here if you want to see how I computed the numbers.

 State Cumulative probability of seeing each state States we saw Alabama 0.671 * Alaska 0 Yes * Arizona 0.065 Arkansas 0.060 California 0.961 Yes Colorado 0.083 Yes * Connecticut 1 Yes Delaware 1 Yes District of Columbia 1 Yes Florida 0.999 Yes Georgia 0.971 Yes Hawaii 0 Idaho 0 Illinois 0.973 Yes Indiana 0.950 * Iowa 0.056 Kansas 0.028 Kentucky 0.710 * Louisiana 0.236 Maine 0.565 Yes Maryland 1 Yes Massachusetts 1 Yes Michigan 0.990 * Minnesota 0.269 Mississippi 0.060 Yes * Missouri 0.563 Yes Montana 0 Nebraska 0.001 Nevada 3.53E-06 New Hampshire 0.911 Yes New Jersey 1 Yes New Mexico 3.14E-05 New York 1 Yes North Carolina 0.999 Yes North Dakota 0 Ohio 0.998 Yes Oklahoma 0.032 Yes * Oregon 0.0006 Pennsylvania 0.999 Yes Rhode Island 0.863 * South Carolina 0.878 Yes South Dakota 0 Tennessee 0.841 * Texas 0.983 Yes Utah 5.61E-05 Vermont 1 Yes Virginia 1 Yes Washington 0.037 West Virginia 0.416 Wisconsin 0.671 Yes Wyoming 0

My family took a lot of road trips when I grew up. To combat boredom, we tried to see how many state license plates we would see on our trip. On a trip to see Mount Rushmore, we found almost all of the states.

As an adult and geek, the license plate game has (subtlety?) changed. Now, I combat boredom by talking with my husband about how to come up with a probability distribution for how many state license plates we would expect to see on a road trip from point A to point B.

We took two road trips this year: one from Richmond, VA to Chicago, IL over the summer, the second from Richmond, VA to Burlington, VT over the winter break. We saw ~35 states in our first trip and ~25 states in our second trip.  My husband and I immediately noticed that we accrued license plates at a slower rate on our winter trip, which we suspect was from fewer people making road trips over the winter as compared to summer.

We wondered if one could estimate how many license plates you would expect to see in a road trip based on

• the states you drive through,
• the time of year (more people take road trips in the summer)

The state that you are in determines how likely you are to see other state license plates based on their relative distances as well as the number of licensed drivers in other states.

We simplified the problem to avoid looking at how long you drove through a state as well as interstate connectivity issues. That is, there is no difference between driving through West Virginia on I-70 and driving through Pennsylvania on I-80. Additionally, if you are in I-80 in Illinois, you are connected to neighbor states Iowa and Indiana but not neighbor states Missouri and Wisconsin, and therefore, one might expect to see Iowa and Indiana plates. We ignored this and just noted that you would be in Illinois, which gives the likelihood of seeing license plates from other states regardless of “route distance.”

My next post summarizes the model, the assumptions, and the results.

Have you tallied license plates on road trips? What do you think are the salient aspects of this problem to include in a probability model?

what are the odds of winning the lottery two times?

A Chicago area man won the lottery for the second time. The Chicago Tribune reports:

Scott Anetsberger duplicated his \$1 million win of nine years ago in the same instant Merry Millionaire game, lottery spokesman Mike Lang said.

Despite long odds, Anetsberger isn’t the first two-time \$1 million instant winner. Kimberly Pleticha of Villa Park won \$1 million twice in the instant Cash Jackpot game–the first time in August 2010 and the second only six months later in February.

Lottery officials could not instantly compute the odds against multiple winners, but did note there have been a dozen or more two-time Little Lotto winners over the years.

What would the odds of winning the lottery twice would be? Well, it depends on how frequently one plays the lottery.

Winning the Illinois Lottery requires picking six correct numbers, where the numbers range from 1 to 52. The odds of getting all six numbers correct is 1 in 20,358,520.  It costs \$0.50 to play the lottery, and there are three lotteries per week. Assuming that each lottery is independent (a reasonable assumption), one would have to play the lottery 20,358,520 times, over average, to win (using the geometric distribution). If one plays the lottery three times per week, then it would take 130,500 years to win the lottery once at a cost of more than \$10M.

Winning the lottery twice can be modeled as a negative binomial random variable. Assuming that our lottery winner plays the lottery three times per week before and after winning the lottery, then it takes ~261,000 years, on average, to win twice.

Since it is only newsworthy to report additional wins by those who have already won the lottery, then we are really only interested in the odds that a lottery winner would win the lottery again. This is a different question. Assuming that our lottery winner continues to play the lottery three times per week, then the odds of winning again are same as the odds of someone else winning the lottery for the first time: 1 in 20,358,520 per lottery. That is, it would take our lottery winner an additional 130,500 years to win the lottery.

If someone plays the lottery more than three times per week, then the odds of winning go up.

Of course, many people play the lottery, so the odds that someone wins the lottery twice over their lifetime is much, much higher. I tell my students every semester, “Someone will win the lottery. Just not you.” If 130,500 people buy one lottery ticket per game, then there would be a two-time winner every 2 years, on average.

Little Lotto involves picking five correct numbers, where the numbers range from 1 to 39.  It is easier to win, but it has a lower payout. The odds of winning are 1 in 575,757, which means that one is 35 times as likely to win the Little Lotto than the regular lottery. It would take 3691 years to win Little Lotto once (by playing three times per week) and 7382 years to win it twice.

Given that there have been 12 two-time winners in Little Lotto in its 23 years of existence, there there is approximately one two-time winner every two years. Given my assumptions, this would suggest that ~3691 people buy a Little Lotto ticket every time. That seems a bit low to me. But I have a head cold and maybe it has temporarily impaired my mathematical abilities.

A seven-time lottery winner’s advice for winning the lottery is to invest more (not less!) of one’s money into buying lottery tickets, as long as one can afford it. He also recommends treating the lottery as a job: the lottery is a skill, and one can improve at it after investing a lot of time. While skill plays a role in playing the lottery (identifying which numbers to pick and identifying which games have the best payoff), I’m pretty sure that this is bad advice. The expected payoff for the lottery is negative, meaning that on average, you are guaranteed to come out behind. The variance in earnings is large, meaning that over many attempts, it is possible that you can come out ahead. But given that one comes out ahead, it would be foolish to attribute one’s success to skill. But maybe I’m missing something.

For the record, I do not recommend gambling or routinely playing the lottery.

For more, read Mike Trick’s post on conditional probabilities and March Madness odds.

Related post:

the World Cup and probability: a lost opportunity?

I was disappointed in the amount of press given to silly predictions during the World Cup.  Or rather, I was disappointed that the silly predictions did not lead to a greater discussion of probability and conditional probability.

First, we have Paul the Octopus.  Wikopedia claims that he was correct in all eight of his World Cup predictions, and that he is 12/14 overall.  Getting all eight predictions roughly has a probability of (1/3)^3 * (1/2)^5 = 1/864 (the first three predictions could have resulted in a tie).  Not bad. But Paul the Octopus gained international fame after his first four correct predictions, which means the conditional probability that his last four predictions are correct given that his first four predictions were correct is (1/2)^4 = 1/16.  It was unexpected that he would continue to make correct predictions after attaining such fame, but given that there were certainly more than 864 bizarre ways to make World Cup predictions, someone had to get them all right.  Still, the octopus is a great mascot.

Next, we have Mick Jagger, who was declared a jinx after all three of the teams he supported lost.  Of course, this was done after the fact.  Given the large number of celebrities who attended World Cup games, it is not surprising that a few saw their teams(s) win more than others.  So that’s my way of saying that I’m not seeing why Mick Jagger was newsworthy.

Are there any other examples of probability–good or bad–that hit the mainstream during the World Cup?  Was there any useful discussion of probability during the World Cup?  How did you make World Cup predictions (aside from relying on octopuses)?

Bulgarian lottery – what are the odds?

The same six winning lottery numbers turned up in two consecutive drawings in the Bulgaria lottery earlier in the month (1 chance in 5.2 million).  Carl Bialik in the WSJ writes about the odds of this happening.  He notes that “With so many numbers colliding each week, the lottery might be the ideal proving ground for something that statisticians have long recognized: Given enough opportunities, the seemingly impossible becomes plausible.”  He explores several lottery issues in more detail in the Numbers Guy blog.  Statistician David Smith also blogged about the Bulgarian lottery.

Although the lottery is random, the people who play it are not.  I had always intuitively known this, but the picture below illustrates this quite nicely.  Apparently, people making lottery picks based on birthdays, for example, skews the picks toward smaller numbers.

Lottery numbers as chosen by lottery players are far from random

The lotteries are designed such that the expected winnings are negative when accounting for the price of the ticket, since the probability of winning is so low (E[winnings] = P(win)*Jackpot – Ticket Price). When the jackpot grows large enough, the “average” lottery player can come out ahead (although there really is no one at the average – there are a couple of winners who really skew the average). In March 1992, the Virginia lottery almost guaranteed a true winner. It offered a jackpot of \$27M to a single winner whereas it cost \$7.5M to purchase all Choose(44, 6) combinations of possible tickets (by piacking six of 44 numbers). Of course, this strategy could backfire if there were many winners. However, a group of 2500 people accepted this challenge and pooled their resources. They ended up being the single winner, and after a legal struggle, they were awarded the jackpot. The Virginia lottery was subsequently changed to be less lucrative.

Do you play the lottery?