2

I am working with a case of the Birthday problem with near misses. (i.e. The probability that there are 2 or more collisions after selecting $n$ elements uniformly from a set of $m$ total elements where a collision is defined as falling within $k$ elements of any previously selected element)

I have the eq. from Wikipedia for the near miss formulation:

$$ p(n,k,m) = 1 - \frac{(m-nk-1)!}{m^{n-1}(m-n(k+1))!} $$

My problem lies in computing the factorials. I am working with numbers on the order of:

$$ m = 1000 * (2^{32}-1), n = 1000, k = 10000 $$

Even Wolfram Alpha gives up with those values. Are there any formulations for this problem that don't involve the factorial or any approximation methods that could make this a reasonably commutable problem? As an extension, is it then also possible to compute/approximate the expected number of collisions after selecting $n$ elements?

1 Answers1

1

You can write $ \frac{(m-nk-1)!}{(m-n(k+1))!}=\frac{(m-nk-1)!}{(m-nk-n)!}$ and note that there are $n-1$ terms that do not cancel out. As $n \ll m$ they are all about the same size. They average $m-nk-\frac n2$, so the fraction is about $(m-nk-\frac n2)^{n+1}$

A better approximation is Stirling's, $m! \approx (\frac me)^m \sqrt {2 \pi m}$ but the $m^m$ part of it may kill you. You can take the log and evaluate the log of your expression.

Ross Millikan
  • 383,099
  • I think you may have a typo in your second eq. in the denominator. I believe it should be $m-nk-n$ instead of $m-nk+n$ along with an extra ')'. – majorpain1588 Feb 13 '17 at 15:50
  • @majorpain1588: You are right. The minus sign carries on downward to the end of the paragraph. All the terms also are now in the numerator. I have fixed it. – Ross Millikan Feb 13 '17 at 15:54