12

To give more details about the title question, assume that we have a collection of $n$ distinct elements. On each step, numbering from zero, we choose one of them uniformly at random (with replacement). I am interested in the distribution of the random variable $X_n$ indicating the moment when any element is selected a second time, i.e. the first collision. Specifically, I am interested in whether $X_n$ is more likely to be even or odd.

It is easy to see that $$p_k(n) := \mbox{P}(X_n=k) = \Big(1-\frac{1}{n}\Big)\Big(1-\frac{2}{n}\Big) \dots \Big(1-\frac{k-1}{n}\Big)\cdot\frac{k}{n} = \binom{n-1}{k-1} \cdot \frac{k!}{n^k}$$ for all $1\le k \le n$. I do not know if this distribution has a common name, but it is closely related to the birthday problem. In particular, it is known that the expected value $\mbox{E}(X_n)$ is asymptotically equal to $\sqrt{\frac{\pi}{2}n}$ for large $n$, see e.g. this Math.SE post, Wikipedia, or the paper 'Bounds on Birthday Attack Times' by Michael J. Wiener.

Let $P_0(n)$ and $P_1(n)$ be the probabilities that $X_n$ is even and odd, respectively. Namely, we have $$ P_0(n) = \sum_{\substack{k=1 \\ k \scriptsize\mbox{ is even}}}^{n} p_k(n), \ \ \ P_1(n) = \sum_{\substack{k=1 \\ k \scriptsize\mbox{ is odd}}}^{n} p_k(n).$$

For $1 \le n \le 6$, the values of $P_1(n)$ are equal to $1, \frac{1}{2}, \frac{5}{9}, \frac{17}{32}, \frac{329}{625}, \frac{169}{324}$, respectively. This (and some more numerical experiments that I better not to show explicitly) suggest the following problem.

Question Is it true, that for all $n>2$, we have $P_1(n)>P_0(n)$, i.e that $X_n$ is more likely to be odd than even?

I think that numerical data supports an even stronger statement that $P_1(n) = \frac{1}{2}+\frac{1+o(1)}{8n}$ as $n \to \infty$, but I don't know how to estimate this sum with such an accuracy.

Finally, what happens if there are more than two 'players', i.e. if we look at $X_n \bmod m$ for some fixed $m>2$ (for instance, modulo 3)? Which residue would have an advantage in this case for large values of $n$?

1 Answers1

4

Yes. For $n > 2$, we have

\begin{align*} &\sum_{k = 0}^{\lfloor (n - 1)/2\rfloor} p_{2k + 1}(n) - \sum_{k = 1}^{\lfloor n/2\rfloor} p_{2k}(n) \\ &= \sum_{k = 0}^{\lfloor (n - 1)/2\rfloor} p_{2k + 1}(n) - \sum_{k = 1}^{\lfloor n/2\rfloor} \frac{n - 2k}{2(n - k)} p_{2k}(n) - \sum_{k = 1}^{\lfloor n/2\rfloor} \frac{n}{2(n - k)} p_{2k}(n) \\ &= \sum_{k = 0}^{\lfloor (n - 1)/2\rfloor} p_{2k + 1}(n) - \sum_{k = 1}^{\lfloor n/2\rfloor} \frac{n - 2k}{2(n - k)} p_{2k}(n) - \sum_{k = 0}^{\lfloor n/2 - 1\rfloor} \frac{n}{2(n - k - 1)} p_{2k + 2}(n) \\ &= \sum_{k = 0}^{\lfloor (n - 1)/2\rfloor} p_{2k + 1}(n) - \frac{n - 2k}{2(n - k)}p_{2k}(n) - \frac{n}{2(n - k - 1)}p_{2k + 2}(n) \\ &= \sum_{k = 0}^{\lfloor (n - 1)/2\rfloor} \frac{(n - 1)!}{(n - 2k - 1)!n^{2k}} \left(\frac{2k + 1}{n} - \frac{k}{n - k} - \frac{(k + 1)(n - 2k - 1)}{n(n - k - 1)}\right) \\ &= \sum_{k = 0}^{\lfloor (n - 1)/2\rfloor} \frac{k(n - 1)!}{(n - k)(n - k - 1)(n - 2k - 1)!n^{2k}} \\ &> 0. \end{align*}

(In the fourth line, we’ve used $p_0(n) = p_{n + 1}(n) = 0$ to adjust the endpoints of the sums to match.)

  • does this get (anywhere close to) the conjectured $\frac{1+o(1)}{8n}$ difference? – mathworker21 Sep 14 '22 at 16:07
  • 2
    @mathworker21 I think so—for large $n$, the final sum can be approximated by $\int_0^∞ \frac{k}{n^2} e^{-2k^2/n},dk = \frac{1}{4n}$, although I haven’t tried to bound the error. – Anders Kaseorg Sep 14 '22 at 23:50
  • Nice trick! Could you maybe clarify a bit how did you come up with the partition in the second line? It just looks very unexpected for me) – Arsenii Sagdeev Sep 18 '22 at 14:40
  • 2
    @ArseniiSagdeev I did the calculation for an arbitrary split $p_{2k}(n) = a_kp_{2k}(n) + (1 - a_k)p_{2k}(n)$, worked out the properties $a_k$ would need for every resulting term to be positive, and constructed a simple formula for $a_k$ that satisfied them. – Anders Kaseorg Sep 20 '22 at 05:30