# Tag Archives: stochastic processes

## chocolate chip cookies are Poisson distributed

I asked for examples of things that are Poisson distributed in class. One student said the number of chocolate chips in a cookie are Poisson distributed. He’s right.

Here is the intuition of when you have a Poisson distribution. First, you should have a counting process where you are interested in the total number of events that occur by time t or in space s.  If each of these events is independent of the others, then the result is a Poisson distribution.

Let’s consider the Poisson process properties of a chocolate chip cookie. Let N(t) denote the number of chocolate chips in a cookie of size t. N(t) is a Poisson process with rate y if all four of the following events are true:

1) The cookie has stationary increments, where the number of chocolate chips in a cookie is proportional to the size of the cookie. In other words, a cookie with twice as much dough should have twice as many chocolate chips (N(t) ~ Poisson (y*t)). That is a reasonable assumption.

2) The cookie has independent increments. The number of chocolate chips in a cookie does not affect the number of chocolate chip cookies in the next cookie.

3) A cookie without any dough cannot have any chocolate chips (N(0)=0)).

4) The probability of finding two or more chocolate chips in a cookie of size h is o(h). In other words, you will find at most one chocolate chip in a tiny amount of dough.

All of these assumptions appear to be true, at least in a probabilistic sense. Technically there may be some dependence between chips if we note that bags of chocolate chips have a finite population (whatever is in the bag). There is some dependence between the number of chocolate chips in one cookie to the next if we note that how many chips we have used thus far gives us additional knowledge about how many chips are left. This would violate the independent increments assumption. However, the independence assumption is approximately true since the frequency of chocolate chips in the cookie you are eating is roughly independent of the frequency of chocolate chips in the cookies you have already eaten. As a result, I expect the Poisson is be an excellent approximation. Picture courtesy of Betty Crocker

## the umbrella problem in my office Once you learn about certain models used in operations research and industrial engineering, you start seeing them everywhere. I see the umbrella problem and its variants everywhere (read about the umbrella problem in this post).

I try to keep a few pens in my laptop bag at all times. These pens will drift in and out of my bag when I work. Every now and then, the pens entirely move out of my bag and I am left without a much needed pen. This happened earlier in March when I was on a trip and had planned to review a paper on my flight.

It has been a very cold winter in Wisconsin. I learned the hard way that it’s useful to keep a spare sweater or two in my office. I often forget to wear them home, and so sweaters and scarves have been accumulating in my office. The city of Madison tries to be pedestrian friendly. They have these pedestrian flags (see picture below) that are supposed to help you cross the street safely. Each intersection has a bunch of flags in two bins with on either side of the street. I drive by a few of these intersections on the way home, and sometimes I see an intersection where all the flags are in one of the two bins. It always makes me smile.

This isn’t exactly the umbrella problem since the same pedestrian doesn’t go from one side of the street to the other indefinitely as in the umbrella problem, but the umbrella problem only needs to be slightly modified to capture the real problem here and to provide insight into how many flags should be stocked to maintain a certain reliability level.

## my stochastic processes midterm: Wisconsin edition

Based on the curiosity over my cheese-related exam question on twitter, I have decided to post the midterm for my graduate level course on stochastic processes. My favorite questions are #1 and #2. I should note that #1 was inspired by actual leftover cheese that is packaged and sold at a discount at the Babock Dairy Store on campus (picture is below). If there is enough extra leftover cheese, it is poured into a bag, leading to cheese that has layers like an onion. It is apparently not cost-effective to repress the leftover cheese into a smooth brick of cheese. As someone who didn’t grow up wearing a foam cheese hat, I find that cheese production, quality control, and inventory is the right avenue for me to learn about cheese.

# The Midterm

1. The dairy plant in Babcock Hall makes one batch of cheese six days per week. The amount of cheese (in pounds) left over after each batch is distributed according to an exponential distribution with parameter 1. Cheese production on each day is independent. The Babcock Dairy store will package and sell any leftover cheese in a batch (i.e., a day) that is more than 1 pound—a “cheese factory second.” Let Ei be the event that there is more than 1 pound of cheese available on day i, with i = 1, 2, 3, 4, 5, 6.

Therefore, Ei is a random variable:
Ei = 1 if there are cheese factory seconds for the type of cheese produced on day i and 0 otherwise (i.e., ignore how much cheese is leftover – we are instead interested in the binary outcome of whether cheese is leftover).

E = the total number of types of cheese factory seconds across the 6 day week.

a) Define E in terms of Ei.
b) What is the sample space for E?
c) What is the probability that there is at least a pound of leftover cheese on day 1?
d) What is the probability that four or more days in the week produce cheese factory seconds?

2. The Lightsaber Manufacturing Company (LMC) operates their manufacturing plant in a galaxy far, far away. They need to decide to how many lightsabers to stock for the next Jedi Convention, where they will sell lightsabers to Jedi apprentices. Due to the expense associated with interstellar travel, LMC will discard the unsold lightsabers after the convention. Lightsabers cost 413 galactic credits to produce and are sold for 795 galactic credits (the unit of currency in the Empire). If the demand for lightsabers follows an exponential distribution with an average of 75 lightsabers, how many lightsabers should the LMC bring to the Jedi Convention to maximize its profit?

3. A student has a hard time figuring out how to get started on homework for Stochastic Modeling Techniques. The student randomly selects one of 3 potential places to start a homework problem with equal probability. The first approach is not fruitful; the student will return to the starting point after 1 hour of work. The second approach will return the student to the starting point after 3 hours. The third will lead to the solution in 15 minutes (1/4 hour). The student is confused, so he/she always chooses from all 3 available approaches each time. What is the expected amount of time it takes this diligent student to solve the homework problem?

4. A mysterious illness called “badgerpox” has affected the local badger population near Madison. The exposure level (X) largely determines whether a badger contracts the disease (D). The probability distribution for the exposure level and the conditional probability of disease given the exposure level are given in the table below.

 Exposure level (Xi) P(Xi) P(D | Xi) 10 0.7 0.001 100 0.15 0.005 1000 0.12 0.12 10000 0.03 0.78

Find:

(a) The conditional distribution P(Xi | D) for each value of Xi
(b) The probability that a badger contracts the disease P(D)
(c) The expected exposure level for badgers that have contracted the disease.

5. A student likes to come to ISyE 624 on time, which is possible as long as the student can travel from left to right in the diagram below. There are two paths to class; the student can pass through if and only if all components along its path are open. Due to construction, the probability that component i is open has probability pi, i = 1,2,3,4. Assume components are independent. If there is not a path to class, the student will arrive to class late, and the professor will be sad. What is the probability the student gets to class on time? 6. The Wisconsin-Minnesota football rivalry dates back to 1890. The teams play once per year for the trophy of “Paul Bunyan’s Axe,” which replaced the first trophy (the “Slab of Bacon,” 1930-1943)*. The teams are unevenly matched, with Wisconsin winning 16 of the last 20 games. Let’s say that Wisconsin wins each game independently with probability 0.8. The teams play next on 11/23/13.

(a) What is the expected number of games/years until Wisconsin loses next?
(b) What is the expected number of games/years until Wisconsin loses 2 games in a row?
(c) What is the probability that it Wisconsin loses for the 3rd time in the 5th game in the series?

* I didn’t make that up! ## how to win at Russian Roulette

Every time I teach stochastic processes, we discuss whether to play Russian Roulette (don’t!). In the off chance one absolutely has to play, we determine the best time to take a turn and whether it is best to spin the barrel.

I believe in teaching important life lessons in class along with operations research. But in this case, I seriously hope none of my students consider this example on Russian roulette an important life lesson. This example is good for exploring how we can quantify probabilities to confirm our intuition.

Just for fun, here are the relevant probabilities of death based on order and spinning strategy. (I like to mix it up with some dark humor in my classes).

First, consider the odds with not spinning the barrel.

Let Ei = the event that the ith person survives (based on order of play). The first person has a 5/6 chance of survival: $P(E1) = 5/6$

If we condition on the first outcome, the second person also has a 5/6 chance of survival: $P(E2) = P(E2|E1)P(E1) + P(E2|E1')P(E1') = (4/5)(5/6) + (1)(1/6) = 5/6$

If we condition on the first two outcomes, the third person has a 5/6 chance of survival: $P(E3) = P(E3|E1,E2)P(E2|E1)P(E1) + P(E3|E1')P(E1') + P(E3|E1,E2')P(E2'|E1)P(E1) = (3/4)(4/5)(5/6) + (1)(1/6) + (1)(1/5)(5/6) = 5/6$

This makes intuitive sense. The bullet goes into one chamber where it is “preassigned” to one player of the game.

Next, consider the odds with spinning the barrel.

Let Ei = the event that the ith person survives (based on order of play). The first person has a 5/6 chance of survival: $P(E1) = 5/6$

If we condition on the first outcome, the second person also has improved chance of survival: $P(E2) = P(E2|E1)P(E1) + P(E2|E1')P(E1') = (5/6)(5/6) + (1)(1/6) = 31/36$

If we condition on the first two outcomes, the third person has an even more improved chance of survival: $P(E3) = P(E3|E1,E2)P(E2|E1)P(E1) + P(E3|E1')P(E1') + P(E3|E1,E2')P(E2'|E1)P(E1) = (5/6)(5/6)(5/6) + (1)(1/6) + (1)(1/6)(5/6) = 191/216$

Continuing in this way, we can compute the odds of death based on each player’s order.
Without spinning the barrel, someone will lose. If every players spins the barrel prior to his/her turn, there is a 33.5% chance that everyone will walk away from the game. Such a small action greatly affects the outcome of the game, especially for those who are among the last to go.

Do not spin the barrel:

 Order P(die) 1 0.1667 2 0.1667 3 0.1667 4 0.1667 5 0.1667 6 0.1667 P(someone dies) 1.0

Spin the barrel:

 Order P(die) 1 0.1667 2 0.1389 3 0.1157 4 0.0965 5 0.0904 6 0.0670 P(someone dies) 0.665

We can see here that there is no guaranteed way to win at Russian roulette. However, going last after everyone spins the barrel lowers your probability of losing by 60%. ## The Birthday Problem with a mating season: A simulation approach

My last blog post illustrated how unlikely the equal birthday likelihood assumption is. I wrote a short simulation code to consider the impact of unequal birthdays. I modeled unequal birthdays as a mating season that results in three months (90 days) that are more likely birthdays than the remaining 365-90 days. This corresponds to July – September in my earlier post.

Let R = (Likelihood of being born in the “hot” 3 months) / (Likelihood of being born in the remaining 9 months).

The Birthday Problem assumes that R = 1. I consider 1 <= R <= 2. This post courtesy of Chris Rump indicates that R < 1.2, meaning that humans don’t have much of a mating season.

The simulations below show the average value of P(n), where P(n) = the probability that someone shares a birthday in a group of n people. The simulations are performed over 1M replications for each value of n. The probability of shared birthday goes up when people are more likely to be born in the birth months associated with “mating season.” But the effects are small, as can be seen by a fairly compressed y-scale. The simulations were performed in Matlab and the program is here.

## The Birthday Problem

Many of you have seen The Birthday Problem: Given a group of n people, what is the probability that someone shares a birthday?

Here, we are only concerned with birth day and month (not year). The solution assumes that a person is equally born on any of the 365 days in the year, thus ignoring leap years.

Let P(n) = the probability that someone shares a birthday in a group of n people and let Q(n) = the probability that everyone has unique birthdays. There are 365^n ways for n people to be born on any of the 365 days.Then

P(n) = 1 – Q(n) = 1 – (365*364*…*(365-n+1))/365^n.

P(n)

P(2) = 0.0028

P(5) = 0.0271

P(10) = 0.1169

P(20) = 0.4114

P(30) = 0.7063

P(40) = 0.8912

P(50) = 0.9704

P(60) = 0.9941 –> in a room with 60 people, you are almost certain to have at least two people that share a birthday!

The key assumption is that all birth dates are equally likely. This NPR article shows that humans have a “mating season” that makes July – September birthdays more likely. I posted the image below.

This will, of course, change our answer above. The probabilities depend on who is in the room. Have you simulated the Birthday Problem with an unequal birthday distribution? If so, please shed light on realistic numbers for P(n).

On a side note, the image below suggests that babies are induced on December 27-30 for a tax break. I’m not sure how I feel about that.

## it’s raining and I have an umbrella: the Midwest edition

This is a sequel to my last post about the Umbrella Problem. As a Midwesterner, one of the problems with using an umbrella is that they get blown inside out, since it’s really windy in the Midwest. As a result, umbrellas have a short lifespan. Here, I extend the Umbrella problem to consider an umbrella being destroyed with probability q. (Only when the umbrella is used, of course. It still rains with probability p.)

In the Markov chain, let our state reflect the total number of umbrellas (this part is new) and the number of available umbrellas at the professors current location (either home or her office). Then the Markov chain has Ux(U+1) states, given by (U,A), where A=number of available umbrellas. The transition probability matrix for U = 2 is:

```         (0,0) (1,0) (1,1) (2,0) (2,1) (2,2)
P= (0,0) [1     0     0     0     0     0   <-- when there are no umbrellas available at all
(1,0)  0     0     1     0     0     0   <--in this state and in (2,0), we don't currently have any umbrellas to move
(1,1)  qp    1-p  (1-q)p 0     0     0   <-- in this state and in (2,1) and (2,2), we can move or lose am umbrella
(2,0)  0     0     0     0     0     1
(2,1)  0    qp     0     0     1-p  (1-q)p
(2,2)  0    qp     0    1-p   (1-q)p 0]```

This Markov chain has an absorbing state of (0,0), meaning that in the long-run, we will have zero umbrellas with certainty. Therefore, we cannot identify the steady state proportion of time the professor gets wet as we did before.

However, Midwesterners periodically replenish their umbrella supplies. This would lead to the design of a (u, U) inventory model, where the number of umbrellas must be replenished to have U umbrellas when their number dwindles down to a mere u umbrellas.Let’s consider a (0,2) inventory model, which means that the professor immediately buys two umbrellas when her last umbrella is destroyed. The Markov chain is then ergodic (all states are in a single, recurrent class, aperiodic). The changes are in boldface:

```         (0,0) (1,0) (1,1) (2,0) (2,1) (2,2)
P= (0,0) [0     0     0     0     0     1
(1,0)  0     0     1     0     0     0
(1,1)  qp    1-p  (1-q)p 0     0     0
(2,0)  0     0     0     0     0     1
(2,1)  0    qp     0     0     1-p  (1-q)p
(2,2)  0    qp     0    1-p   (1-q)p 0]```

Now we can analyze this Markov chain by solving for the limiting distribution π in the usual way. The probability that the professor gets wet is now p(π[0,0]+π[1,0]+π[2,0]). This assumes that when the umbrella brakes, the professor is able to stay dry. That may be a dubious assumption, but Midwesterners are crafty like that.

A sensitivity analysis on p with q = 0.1 yields the following results: A sensitivity analysis on q with p = 0.25 yields the following results:  This isn't one of my umbrellas, but sights like this are common after blustery storms in the Midwest. I've lost many a good umbrella.