Tag Archives: stochastic processes

chocolate chip cookies are Poisson distributed

I asked for examples of things that are Poisson distributed in class. One student said the number of chocolate chips in a cookie are Poisson distributed. He’s right.

Here is the intuition of when you have a Poisson distribution. First, you should have a counting process where you are interested in the total number of events that occur by time t or in space s.  If each of these events is independent of the others, then the result is a Poisson distribution.

Let’s consider the Poisson process properties of a chocolate chip cookie. Let N(t) denote the number of chocolate chips in a cookie of size t. N(t) is a Poisson process with rate y if all four of the following events are true:

1) The cookie has stationary increments, where the number of chocolate chips in a cookie is proportional to the size of the cookie. In other words, a cookie with twice as much dough should have twice as many chocolate chips (N(t) ~ Poisson (y*t)). That is a reasonable assumption.

2) The cookie has independent increments. The number of chocolate chips in a cookie does not affect the number of chocolate chip cookies in the next cookie.

3) A cookie without any dough cannot have any chocolate chips (N(0)=0)).

4) The probability of finding two or more chocolate chips in a cookie of size h is o(h). In other words, you will find at most one chocolate chip in a tiny amount of dough.

All of these assumptions appear to be true, at least in a probabilistic sense. Technically there may be some dependence between chips if we note that bags of chocolate chips have a finite population (whatever is in the bag). There is some dependence between the number of chocolate chips in one cookie to the next if we note that how many chips we have used thus far gives us additional knowledge about how many chips are left. This would violate the independent increments assumption. However, the independence assumption is approximately true since the frequency of chocolate chips in the cookie you are eating is roughly independent of the frequency of chocolate chips in the cookies you have already eaten. As a result, I expect the Poisson is be an excellent approximation.

Picture courtesy of Betty Crocker

 


the umbrella problem in my office

20140314-161742.jpg

Once you learn about certain models used in operations research and industrial engineering, you start seeing them everywhere. I see the umbrella problem and its variants everywhere (read about the umbrella problem in this post).

I try to keep a few pens in my laptop bag at all times. These pens will drift in and out of my bag when I work. Every now and then, the pens entirely move out of my bag and I am left without a much needed pen. This happened earlier in March when I was on a trip and had planned to review a paper on my flight.

It has been a very cold winter in Wisconsin. I learned the hard way that it’s useful to keep a spare sweater or two in my office. I often forget to wear them home, and so sweaters and scarves have been accumulating in my office.

The city of Madison tries to be pedestrian friendly. They have these pedestrian flags (see picture below) that are supposed to help you cross the street safely. Each intersection has a bunch of flags in two bins with on either side of the street. I drive by a few of these intersections on the way home, and sometimes I see an intersection where all the flags are in one of the two bins. It always makes me smile.

This isn’t exactly the umbrella problem since the same pedestrian doesn’t go from one side of the street to the other indefinitely as in the umbrella problem, but the umbrella problem only needs to be slightly modified to capture the real problem here and to provide insight into how many flags should be stocked to maintain a certain reliability level.

 

Look for a few posts about March Madness next week!


my stochastic processes midterm: Wisconsin edition

Based on the curiosity over my cheese-related exam question on twitter, I have decided to post the midterm for my graduate level course on stochastic processes. My favorite questions are #1 and #2. I should note that #1 was inspired by actual leftover cheese that is packaged and sold at a discount at the Babock Dairy Store on campus (picture is below). If there is enough extra leftover cheese, it is poured into a bag, leading to cheese that has layers like an onion. It is apparently not cost-effective to repress the leftover cheese into a smooth brick of cheese. As someone who didn’t grow up wearing a foam cheese hat, I find that cheese production, quality control, and inventory is the right avenue for me to learn about cheese.

https://twitter.com/lauramclay/status/392640686094155776

The Midterm

1. The dairy plant in Babcock Hall makes one batch of cheese six days per week. The amount of cheese (in pounds) left over after each batch is distributed according to an exponential distribution with parameter 1. Cheese production on each day is independent. The Babcock Dairy store will package and sell any leftover cheese in a batch (i.e., a day) that is more than 1 pound—a “cheese factory second.” Let Ei be the event that there is more than 1 pound of cheese available on day i, with i = 1, 2, 3, 4, 5, 6.

Therefore, Ei is a random variable:
Ei = 1 if there are cheese factory seconds for the type of cheese produced on day i and 0 otherwise (i.e., ignore how much cheese is leftover – we are instead interested in the binary outcome of whether cheese is leftover).

E = the total number of types of cheese factory seconds across the 6 day week.

a) Define E in terms of Ei.
b) What is the sample space for E?
c) What is the probability that there is at least a pound of leftover cheese on day 1?
d) What is the probability that four or more days in the week produce cheese factory seconds?

2. The Lightsaber Manufacturing Company (LMC) operates their manufacturing plant in a galaxy far, far away. They need to decide to how many lightsabers to stock for the next Jedi Convention, where they will sell lightsabers to Jedi apprentices. Due to the expense associated with interstellar travel, LMC will discard the unsold lightsabers after the convention. Lightsabers cost 413 galactic credits to produce and are sold for 795 galactic credits (the unit of currency in the Empire). If the demand for lightsabers follows an exponential distribution with an average of 75 lightsabers, how many lightsabers should the LMC bring to the Jedi Convention to maximize its profit?

3. A student has a hard time figuring out how to get started on homework for Stochastic Modeling Techniques. The student randomly selects one of 3 potential places to start a homework problem with equal probability. The first approach is not fruitful; the student will return to the starting point after 1 hour of work. The second approach will return the student to the starting point after 3 hours. The third will lead to the solution in 15 minutes (1/4 hour). The student is confused, so he/she always chooses from all 3 available approaches each time. What is the expected amount of time it takes this diligent student to solve the homework problem?

4. A mysterious illness called “badgerpox” has affected the local badger population near Madison. The exposure level (X) largely determines whether a badger contracts the disease (D). The probability distribution for the exposure level and the conditional probability of disease given the exposure level are given in the table below.

Exposure level (Xi) P(Xi) P(D | Xi)
10 0.7 0.001
100 0.15 0.005
1000 0.12 0.12
10000 0.03 0.78

Find:

(a) The conditional distribution P(Xi | D) for each value of Xi
(b) The probability that a badger contracts the disease P(D)
(c) The expected exposure level for badgers that have contracted the disease.

5. A student likes to come to ISyE 624 on time, which is possible as long as the student can travel from left to right in the diagram below. There are two paths to class; the student can pass through if and only if all components along its path are open. Due to construction, the probability that component i is open has probability pi, i = 1,2,3,4. Assume components are independent. If there is not a path to class, the student will arrive to class late, and the professor will be sad. What is the probability the student gets to class on time?

midterm1_prob1

6. The Wisconsin-Minnesota football rivalry dates back to 1890. The teams play once per year for the trophy of “Paul Bunyan’s Axe,” which replaced the first trophy (the “Slab of Bacon,” 1930-1943)*. The teams are unevenly matched, with Wisconsin winning 16 of the last 20 games. Let’s say that Wisconsin wins each game independently with probability 0.8. The teams play next on 11/23/13.

(a) What is the expected number of games/years until Wisconsin loses next?
(b) What is the expected number of games/years until Wisconsin loses 2 games in a row?
(c) What is the probability that it Wisconsin loses for the 3rd time in the 5th game in the series?

* I didn’t make that up!

20131024-104034.jpg


The Birthday Problem with a mating season: A simulation approach

My last blog post illustrated how unlikely the equal birthday likelihood assumption is. I wrote a short simulation code to consider the impact of unequal birthdays. I modeled unequal birthdays as a mating season that results in three months (90 days) that are more likely birthdays than the remaining 365-90 days. This corresponds to July – September in my earlier post.

Let R = (Likelihood of being born in the “hot” 3 months) / (Likelihood of being born in the remaining 9 months).

The Birthday Problem assumes that R = 1. I consider 1 <= R <= 2. This post courtesy of Chris Rump indicates that R < 1.2, meaning that humans don’t have much of a mating season.

The simulations below show the average value of P(n), where P(n) = the probability that someone shares a birthday in a group of n people. The simulations are performed over 1M replications for each value of n. The probability of shared birthday goes up when people are more likely to be born in the birth months associated with “mating season.” But the effects are small, as can be seen by a fairly compressed y-scale. The simulations were performed in Matlab and the program is here.

The Birthday Problem probability for n=5.

The Birthday Problem probability for n=10.

The Birthday Problem probability for n=20.

The Birthday Problem probability for n=30.

The Birthday Problem probability for n=40.

The Birthday Problem probability for n=50.

 


The Birthday Problem

Many of you have seen The Birthday Problem: Given a group of n people, what is the probability that someone shares a birthday?

Here, we are only concerned with birth day and month (not year). The solution assumes that a person is equally born on any of the 365 days in the year, thus ignoring leap years.

Let P(n) = the probability that someone shares a birthday in a group of n people and let Q(n) = the probability that everyone has unique birthdays. There are 365^n ways for n people to be born on any of the 365 days.Then

P(n) = 1 – Q(n) = 1 – (365*364*…*(365-n+1))/365^n.

P(n)

P(2) = 0.0028

P(5) = 0.0271

P(10) = 0.1169

P(20) = 0.4114

P(30) = 0.7063

P(40) = 0.8912

P(50) = 0.9704

P(60) = 0.9941 –> in a room with 60 people, you are almost certain to have at least two people that share a birthday!

The key assumption is that all birth dates are equally likely. This NPR article shows that humans have a “mating season” that makes July – September birthdays more likely. I posted the image below.

This will, of course, change our answer above. The probabilities depend on who is in the room. Have you simulated the Birthday Problem with an unequal birthday distribution? If so, please shed light on realistic numbers for P(n).

On a side note, the image below suggests that babies are induced on December 27-30 for a tax break. I’m not sure how I feel about that.

How likely are people to be born on different birth dates?


it’s raining and I have an umbrella: the Midwest edition

This is a sequel to my last post about the Umbrella Problem. As a Midwesterner, one of the problems with using an umbrella is that they get blown inside out, since it’s really windy in the Midwest. As a result, umbrellas have a short lifespan. Here, I extend the Umbrella problem to consider an umbrella being destroyed with probability q. (Only when the umbrella is used, of course. It still rains with probability p.)

In the Markov chain, let our state reflect the total number of umbrellas (this part is new) and the number of available umbrellas at the professors current location (either home or her office). Then the Markov chain has Ux(U+1) states, given by (U,A), where A=number of available umbrellas. The transition probability matrix for U = 2 is:

         (0,0) (1,0) (1,1) (2,0) (2,1) (2,2)
P= (0,0) [1     0     0     0     0     0   <-- when there are no umbrellas available at all
   (1,0)  0     0     1     0     0     0   <--in this state and in (2,0), we don't currently have any umbrellas to move
   (1,1)  qp    1-p  (1-q)p 0     0     0   <-- in this state and in (2,1) and (2,2), we can move or lose am umbrella
   (2,0)  0     0     0     0     0     1
   (2,1)  0    qp     0     0     1-p  (1-q)p
   (2,2)  0    qp     0    1-p   (1-q)p 0]

This Markov chain has an absorbing state of (0,0), meaning that in the long-run, we will have zero umbrellas with certainty. Therefore, we cannot identify the steady state proportion of time the professor gets wet as we did before.

However, Midwesterners periodically replenish their umbrella supplies. This would lead to the design of a (u, U) inventory model, where the number of umbrellas must be replenished to have U umbrellas when their number dwindles down to a mere u umbrellas.Let’s consider a (0,2) inventory model, which means that the professor immediately buys two umbrellas when her last umbrella is destroyed. The Markov chain is then ergodic (all states are in a single, recurrent class, aperiodic). The changes are in boldface:

         (0,0) (1,0) (1,1) (2,0) (2,1) (2,2)
P= (0,0) [0     0     0     0     0     1
   (1,0)  0     0     1     0     0     0
   (1,1)  qp    1-p  (1-q)p 0     0     0
   (2,0)  0     0     0     0     0     1
   (2,1)  0    qp     0     0     1-p  (1-q)p
   (2,2)  0    qp     0    1-p   (1-q)p 0]

Now we can analyze this Markov chain by solving for the limiting distribution π in the usual way. The probability that the professor gets wet is now p(π[0,0]+π[1,0]+π[2,0]). This assumes that when the umbrella brakes, the professor is able to stay dry. That may be a dubious assumption, but Midwesterners are crafty like that.

A sensitivity analysis on p with q = 0.1 yields the following results:

A sensitivity analysis on q with p = 0.25 yields the following results:

This isn't one of my umbrellas, but sights like this are common after blustery storms in the Midwest. I've lost many a good umbrella.


is anything really exponentially distributed?

I am teaching stochastic processes again this semester. After enjoying a humorous exchange on twitter about the exponential distribution, I wondered about the practicality of the exponential distribution. Most of the examples from my notes seem a little idealized.  For example, I’m almost positive that light bulb times are not exponentially distributed. Is anything really exponentially distributed?

Things that are almost certainly exponentially distributed:

  • The amount of ink and graffiti on dollar bills that have been in circulation more than one year (a student who works for the Federal Reserve Bank provided this useful tifbit!).
  • The useful life of things made from polyurethane foam, such as Nerf balls, car seats, mattress pads, and carpet pads (the useful life occurs after the break-in period and prior to the break-down period).
  • The time between 911 calls (see below).
  • The time between celebrity deaths.

I looked at some emergency medical 911 data.  The call volume changes over the course of the day, but the call volume is constant for a large part of the day.  Looking just at that time period, I examined the interarrival times of the 911 calls.  The exponential distribution is more or less a perfect fit!  I don’t have any data for my celebrity deaths hypothesis, but since they are independent, like most 911 calls, then I would expect celebrity deaths to follow an exponential distribution.

The time between emergency medical 911 calls

The time between emergency medical 911 calls is exponentially distributed

What else is exponentially distributed?  Do you know how light bulb lifetimes are distributed?


robust Christmas shopping

I usually more or less finish my Christmas shopping before Black Friday, so I usually do not worry about optimal shopping strategies. I enjoyed reading about Aurelie Thiele’s latest paper on robust timing of markdowns.  Her blog post summarizes her new paper (she provides a link to the paper), where Aurelie and her collaborators propose a method for dynamically timing markdowns over a finite horizon using robust optimization models.

This got me thinking about how to reverse-engineer the process to get the best deals.  If you’re not willing to do Black Friday shopping (the getting up at 3am and waiting in long lines do not make up for the extra savings), then shopping for a few key items is like a stopping problem (e.g., the Secretary Problem), where a series of deals are offered (some at the same time, such as in the Sunday ads) and the consumer eventually decides to purchase exactly one item with a deadline of December 24.  When should I purchase the item?  I suppose it depends on how retailers adjust the prices from one time period to the next.

These are a few rules of thumb that I’ve learned (beware: no modeling or algorithms used here).  With a bad economy, I see no drawback to a wait and see approach, since sales continually improve and store coupons become available as shoppers stay home.  I have found that in the last three years or so, the pre-Christmas sales (not just Black Friday) are often better than the post-Christmas clearance sales.  So I stock up and do some personal shopping before the holidays.  However, the clearance sales become lucrative in February and insanely good in March (for clothes, at least).

How do you do your Christmas shopping?

Related posts:


plantains and coupon collecting

I must have stochastic processes on the mind after wrapping up another semester.  I will admit that this is a rather silly post. I was sauteing a couple of plantains last night.  I cut the plantains into thin slices and then saute them in a little olive oil in a cast iron skillet until they are browned.  It’s hard to flip plantain slices over, and only a few slices can come into contact with the skillet.  As a result, it’s hard to evenly cook the slices.

As I was cooking, I starting to mentally model how I would estimate how long it would take to brown all of the plantain slices.  Since I flip plantains at semi-regular intervals (about once every minute or two), it provides a discrete-time structure to the problem.  The more flips have occurred, the lower the probability that any unbrowned plantain comes into contact with the skillet (since it becomes more likely that already-browned slices come in contact with the skillet).  This is not unlike the coupon collecting problem, where there is a set of n coupons from which coupons are collected one at a time with replacement.  The set of n coupons is my set of plantain slices, plantain slices are collected when they come in contact with the skillet, and plantain slices can come into contact with the skillet more than once.  I do not collect coupons one by one, but rather I collect a subset of coupons at each flip–the number of plantain slices that can come in contact with the skillet.  The bottom line is that it takes an increasingly long time to brown each additional plantain slice.

Of course, I could always game the system and manually exchange browned plantain slices with unbrowned plantain slices to sleep along the cooking process and encourage uniformity.  But (a) I don’t have the time to devote to manually flipping with two kids to feed and (b) that’s a little obsessive.

So that leaves me with the solution to the coupon collecting problem: the plantains would burn before they would all be browned. I suppose I’ll have to work on my plantain-turning technique and accept a few undercooked plantains.

What algorithm do you use for skillet cooking?  How do you cook plantains? (I love ’em, but I’m a plantain novice. I’d love some new recipes).

Plantains

Links:

Related posts:


OR and H1N1

This is the second of three posts about the INFORMS Annual Meeting.

I enjoyed a talk by Dr. Richard Larson of MIT about the timely topic of H1N1 and operations research.  I tuned out much of the alarmist news prior to the conference (to keep my sanity) and instead adopted a rigorous handwashing regimen.  Larson’s talk highlighted the many opportunities for addressing H1N1 issues using operations research, including:

  • Queuing for vaccinations.
  • Reneging on vaccinations (some health care workers are refusing required vaccinations).
  • Timing the vaccinations (before the prevalence peaks) is important for reducing risks, since youths are particularly susceptible to dying from H1N1..
  • Locating facilities to manage surge capacity when the epidemic hits.
  • Correctly diagnosing and isolating cases of H1N1.
  • Supply chains for vaccinations.

Larson and his collaborator Dr. Stan Finkelstein takes a different kind of focus, looking at personal choices, such as hand washing, coughing into sleeves, avoiding handshakes, and avoiding crowds.  They examine this issue through non-pharmaceutical interventions.  Someone infected with H1N1 infects about 1.5 people in the next 24 hours (on average).  This value is the mean of a random variable, which depends on personal choices (like handwashing).  He examines the conditions under which the average number of infections decreases below 1.0, when the virus essentially dies out (Similar to my reasoning on vampire populations).

Finkelstein, a medical doctor, discussed some of the policy results.  Initial reports suggested that H1N1 has a fatality rate of about 50% (Spanish flu has a FR of 3%).  After an initial panic, flu fatigue set in.  And the first wave of H1N1 resemble seasonal rather than pandemic flu.  But after the recent panicking, many of us simply have not been motivated to improve our personal choices to reduce H1N1 transmission.  Case in point, elbow bumping pictured below (instead of hand shaking) did not catch on at the conference as I had hoped. And the anti-bacterial hand gel was not located in useful places at the conference, so I used my own personal stash of anti-bacterial lotion after shaking hands.

I hope some of this research is used to lessen the impact of H1N1 this year before I am transformed into a germ-a-phobe.

Link:  Flu101@MIT

Karima Nigmatulina, after successfully defending the first PhD thesis on our flu research project, bumps congratulatory elbows with advisor Richard Larson as Anna Teytelman looks on. 	 	 CESF Venn CESF embraces problems operating at the Venn diagram intersection of ‘traditional engineering,’ management (broadly interpreted) and social science.

Karima Nigmatulina, after successfully defending the first PhD thesis on our flu research project, bumps congratulatory elbows with advisor Richard Larson.