# The Birthday Problem

Many of you have seen The Birthday Problem: Given a group of n people, what is the probability that someone shares a birthday?

Here, we are only concerned with birth day and month (not year). The solution assumes that a person is equally born on any of the 365 days in the year, thus ignoring leap years.

Let P(n) = the probability that someone shares a birthday in a group of n people and let Q(n) = the probability that everyone has unique birthdays. There are 365^n ways for n people to be born on any of the 365 days.Then

P(n) = 1 – Q(n) = 1 – (365*364*…*(365-n+1))/365^n.

P(n)

P(2) = 0.0028

P(5) = 0.0271

P(10) = 0.1169

P(20) = 0.4114

P(30) = 0.7063

P(40) = 0.8912

P(50) = 0.9704

P(60) = 0.9941 –> in a room with 60 people, you are almost certain to have at least two people that share a birthday!

The key assumption is that all birth dates are equally likely. This NPR article shows that humans have a “mating season” that makes July – September birthdays more likely. I posted the image below.

This will, of course, change our answer above. The probabilities depend on who is in the room. Have you simulated the Birthday Problem with an unequal birthday distribution? If so, please shed light on realistic numbers for P(n).

On a side note, the image below suggests that babies are induced on December 27-30 for a tax break. I’m not sure how I feel about that.

How likely are people to be born on different birth dates?

#### 12 responses to “The Birthday Problem”

• meagangracie

That’s really interesting to see – I’d like to see the distribution over years as well. Like before inductions became as common, as well as more advanced birth control. With Will I was “encouraged” to consider induction so he wouldn’t be born while the doctors were on their Thanksgiving holidays, but he came on his own a week before.

• Chris Rump

This heat map is based on rankings (http://www.nytimes.com/2006/12/19/business/20leonhardt-table.html) not the actual frequencies which would make for a nice sim

• fbahr

Apparently, Dec 25th is the least common birthdate in the U.S., followed by Jan 1st. (Ok, actually Feb 29th is the least common one.) – http://www.nytimes.com/2006/12/19/business/20leonhardt-table.html?_r=1
But: http://minnesota.publicradio.org/display/web/2009/12/29/january-1-birthdays/ (not to mention: it’s probably the most common birthdate used in online service registrations.)

• Chris Rump

Using the frequency data referenced here (http://www.panix.com/~murphy/bday.html), I found no significant difference from the theoretical value (assuming uniformity) for P(23) = 0.507. I just presented this as a teaser on Mon night to kick off the summer term of an MBA course; I’ll hit them up with this “update” in an hour.

• matforddavid

It is an undergraduate exercise to show that if one date has probability 1/365 + delta and another 1/365 – delta, with all other dates having probabilities 1/365, then the probability that there is one match in a group of n is greater than if all probabilities are equal. By extension, if the probabilities are unequal, then the probability of a match in a group of n is greater than if all probabilities are equal. The extreme case is of course, everyone being born on the same date!

• matt

Matforddavid – proof or it didn’t happen

• Eric

Ran a quick and dirty monte carlo simulation in Matlab. Here’s what I got.

```P( 2) = 0.002725000000000

P( 5) = 0.027205000000000

P(10) = 0.117384000000000

P(20) = 0.412472000000000

P(30) = 0.707320000000000

P(40) = 0.891747000000000

P(50) = 0.970613000000000

P(60) = 0.994158000000000
```

here’s the code I ran. Not the best but it’ll do:

```hit = zeros(8,1);

idx = 0;
for k = [2 5 10 20 30 40 50 60]
idx = idx + 1;
for i = 1:1e6
[~,x] = histc(rand(1,k),[0;cumsum(p(:))/sum(p)]);
if ( length(unique(x)) == length(x) )
hit(idx) = hit(idx) + 1;
end
end
end

1 - hit/1e6
```
• iamreddave

Could babies be induced to avoid being born in the “wrong” Chinese year. I am told some animals are good to be born under some like the pig bad.

Is there a bump caused by Valentines day? Would you expect the superbowl to cause an increase or a drop?

There is a chapter in Wiseman’s book quirkology where he talks about the parents of Churchmen faking their birthday to be on December 25th

• Paul A. Rubin

Regarding your conjecture about tax breaks, I’d be more inclined to suspect it’s a backlog of planned C-sections due to doctors taking off the preceding few days (very low frequency compared to neighboring points, no doubt a holiday “seasonal” effect).

• David Curran

“The probabilities depend on who is in the room” One other issue is some groups are more likely to share birthdays than average. Professional sports people tend to be born at a time that means they are old for their underage games. So they are on the old end of 10 in the under 11’s. This means they are usually born in the first three months of the year.

Kary Mullis in his support for astrology says “A recent scientific study of the distribution of medical students in birth
months discovered that a lot of medical students were born in late June. ” http://www.crawfordperspectives.com/documents/IAMACAPRICORN_000.pdf

I dont know the paper but it implies that there could be many professions that cluster in birthdate

• chmullig

I coincidentally did a simulation the other day. The answer is essentially unchanged, but for medium groups (10-50 people) reality seems to be very slightly favored, to the tune of 0.15% more likely to find a match. My simulation: The CDC data includes birth day for 1969 to 1988.

I actually did a very similar simulation a few days ago using that full data and found that they were nearly identical. Slightly more likely in reality than the simulation, but only 0.14% more likely at n=23 (and n=23 is still the minimum group size necessary for >= 50%.