Birthday Paradox and Probability of Collisions

The birthday paradox is the phenomenon where even with a small fraction of the number of days in the year the probability of two people having the same birthday becomes large.  For instance if there are 36 people in a room (10% of 365) there is a 83% probability that two of them have the same birthday.  With only 23 people we are already over 50%.

This same logic doesn’t only apply, though, to birthdays.  One can calculate the probability of collision for any set of elements given the number of samples.  For instance, assume you want to assign a unique number to each of your customers.  And let’s say you were going to choose the number at random from a huge pool of numbers say 999,999,999. If you anticipate having 100,000 customers then one would expect with probability 99.3% to sample a number that was previously sampled for another customer, even though you have only selected .01% of the available numbers. I thought it would be fun to write an F# program to calculate this.

The probability is calculated as follows P = 1 – Probability of no collisions

It is easier to calculate the probability that there aren’t collisions and subtract that from one. We can use simple counting principles to calculate the probability of choosing n samples without any collisions. For example if we have 365 days as the elements and we have 3 people (samples), then the first person has 365 possibble days he could choose without collision. The second person has 364 possible days out of 365 and the third 363 possible days. So

Probabiliby of No Collisions = (365/365)*(364/365)*(363/365)
and
Probabiliby of Collision = 1 – (365/365)*(364/365)*(363/365)

Here is the following code which implements the above algorithm in F#

let collisionProbability  elements samples =
 let probNthSample n =
  let a = (float  elements)
  let b = (float  (elements - n + 1))
  b / a
 let rec probNoCollision  counter answer =
  match counter with
  | 0 -> answer
  | _ -> probNoCollision (counter - 1) (answer *  (probNthSample counter))
 in
  1.0 - (probNoCollision samples 1.0)

Some things I like about F# are its terseness. You don’t have to provide type annotations for everything unlike in C# or Java. The compiler infers them based on how they are used. Here, I am also using nested functions, which are great for breaking up an algorithm into smaller bits while still maintaining the scope of the functions to where they will be used. Also, nested functions can use values in scope. For instance elements is not passed into probNthSample. This algorithm was coded using tail-recursion to iterate over the different samples. Typically one would use a loop for this in imperative programming, but in a functional style recursion is used more often. (One could also make a list of the samples and fold_left.) Tail-recursion prevents the stack size from growing by not keeping previously called procedures on the stack if there is no additional work to be done. In the inner probNoCollision method, this is the case. The last thing that is used here is pattern matching. Stylistically, I like pattern matching better than if statements.

The probNoCollision function calculates what it name says. It multiplies together the probability of all the samples not colliding.

probNthSample is a helper function which just gets a float of the probability of the nth sample not colliding. For instance, in the birthday problem for the 2nd sample this method would return 364/365.

The last step just returns 1-probNoCollision.

To run this you could load this in the F# interactive window. And then run the command
collisionProbability 365 23;;
collisionProbability 999999999 100000;;

I am really enjoying F# and OCAML programming so far. I will be posting an article soon on permutations that I also coded in F#. This will also be in a functional style and will show more techniques of code reuse that are powerful, that one doesn’t typically enjoy in languages like C#.

Update –

I have rewritten the above function in a more functional style.


let collisionProb2 elements samples =
 let probNoCollision answer n =
  let a = (float elements)
  let b = (float (elements - n + 1))
  answer * b / a
 let answer = seq { 1 .. samples } |> Seq.fold probNoCollision 1.0
 1.0 - answer

This version of the code is more succinct and uses the fold function, as opposed to explicitly recursing over the elements.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s