4

A total of $r$ keys are to be put, one at a time, in $k$ boxes, with each key independently put in box $i$ with probability $p_i$ (hence, $\sum_{i=1}^k p_i = 1$). Each time a key is put in a non-empty box, we say that a collision occurs. Find the expected number of collisions.

Let random variable $X$ be the number of collisions. I first tried to derive the PMF of $X$, but got stuck. How would I do this?

  • To find the expected number of collisions you can write $X=X_1+\cdots+X_k$ where $X_i$ denotes the number of collisions that occur in box $i$. Then with linearity of expectation and symmetry you find $\mathbb EX=k\mathbb EX_1$ and it remains to find $\mathbb EX_1$. So actually you don't need that PMF. Do you still want to find it (as the title of your question suggests)? – drhab Mar 09 '18 at 11:22
  • Ahh thanks. Yeah, I wanted to see if I could derive it. How would I find EX_1 without the PMF? – Mr. Bromwich I Mar 09 '18 at 11:33
  • Sorry, but I misinformed you. Since the $p_i$ are distinct we cannot use symmetry and end up with $\mathbb EX=\sum_{i=1}^k\mathbb EX_i$. The for each $i$ we must find $\mathbb EX_i$. – drhab Mar 09 '18 at 12:10

3 Answers3

5

In this answer I will not find the PMF of $X$ but the expectation of $\mathsf EX_i$ where $X_i$ denotes the number of collisions that take place in box $i$ (see my comment on the question).

If $Y_i$ denotes the number of keys that are put in box $i$ then we can find $Y_i$ again by linearity of expectation. But unfortunately we do not have $X_i=Y_i-1$ (which is about the first thing that comes to mind). This because $X_i$ takes value $0$ if $Y_i=0$.

That fact forces us to find the PMF of $X_i$.

  • $\mathsf P(X_i=0)=\mathsf P(Y_i=0)+\mathsf P(Y_i=1)= (1-p_i)^r+rp_i(1-p_i)^{r-1}$
  • $\mathsf P(X_i=n)=\mathsf P(Y_i=n+1)=\binom{r}{n+1}p_i^{n+1}(1-p_i)^{r-n-1}$ for $n=1,\dots, r-1$

Then it remains to find $\mathsf EX_i=\sum_{n=0}^{r-1}n\mathsf P(X_i=n)$.

And after that $\mathsf EX=\sum_{i=1}^k\mathsf EX_i$.


edit:

Note that $X_i=Y_i-1+\mathbf1_{Y_i=0}$ and $Y_i\sim\mathsf{Bin}(r,p_i)$ so that:$$\mathsf EX_i=\mathsf EY_i-1+\mathsf P(Y_i=0)=rp_i-1+(1-p_i)^r$$

So we end up with:$$\mathsf EX=\sum_{i=1}^k[rp_i-1+(1-p_i)^r]=r-k+\sum_{i=1}^k(1-p_i)^r$$

drhab
  • 153,781
  • This answer was downvoted. However any explanation of that lacks. Sad phenomenom on this site. – drhab Jan 25 '22 at 06:36
  • 1
    +1 to your answer. Didn't notice it before. Astonished that anyone would downvote it. There is no quality control on downvoting. – user2661923 Jan 26 '22 at 07:42
2

Keeping in mind that we are only after expectations, not full distributions, let us consider $N_t$ the number of nonempty boxes after $t$ keys are placed in the boxes. Then $N_{t+1}\in\{N_t-1,N_t\}$, and $N_{t+1}=N_t$ signals a collision hence the total number of collisions after $r$ keys are used is

$$X_r=r-N_r$$

On the other hand, $N_r=k-E_r$, where $E_r$ denotes the number of empty boxes after $r$ keys are used. Thus, $E(E_r)=p_1+\cdots+p_k$, where $$p_i=P(\text{box $i$ is empty})=\left(1-p_i\right)^r$$ All this leads to

$$E(X_r)=r-k+\sum_{i=1}^k(1-p_i)^r$$

For example, if the boxes are chosen uniformly randomly, $$E(X_r)=r-k+k\left(1-\frac1k\right)^r$$

drhab
  • 153,781
Did
  • 284,245
  • a collision occurs each time a key is put in a non-empty box. I read that as one collision, not depending on the number of keys in that box. – drhab Mar 11 '18 at 11:15
  • @drhab And you are quite right about it. Thanks for the comment, and please see revised version. – Did Mar 11 '18 at 11:32
  • The revised version is nice and the answer matches mine (see the edit of my question). – drhab Mar 11 '18 at 13:36
0

Another method:

Say there are $\binom{r}{2}$ pairs of keys. Each has collision with probability $\sum_{i\in[k]} p_i^2$ where $[k] = \{1,\dots,k\}$.

Thus the expected number of collisions between pairs is $\binom{r}{2} \sum_{i\in[k]} p_i^2$. But this overcounts as there may be collisions with sets of 3, which are repeat-counted. This has probability $\sum_{i\in[k]} p_i^3$ and there are $\binom{r}{3}$ many triples. Thus expected number of collisions with sets of 3 is $\binom{r}{3} \sum_{i\in[k]} p_i^3$.

In general, the expected number of collisions , with sets of size $j$ is $$\binom{r}{j} \sum_{i\in[k]} p_i^j$$

Now using the inclusion-exclusion principle, the total expected number of collisions is $$\binom{r}{2} \sum_{i\in[k]} p_i^2 - \binom{r}{3} \sum_{i\in[k]} p_i^3 + \dots + (-1)^r \binom{r}{r} \sum_{i\in[k]} p_i^r$$

But this can be simplified further by adding and subtracting $\binom{r}{0}\sum_{i\in[k]} (p_i^0) - \binom{r}{1}(\sum_{i\in[k]} p_i) = k-r$.

Then the above expression equals: \begin{align*} &\binom{r}{2} \sum_{i\in[k]} p_i^2 - \binom{r}{3} \sum_{i\in[k]} p_i^3 + \dots + (-1)^r \binom{r}{r} \sum_{i\in[k]} p_i^r\\ &= - \left(k - r\right) + \left(\binom{r}{0} \sum_{i\in[k]} p_i^0 - \binom{r}{1} \sum_{i\in[k]} p_i + \binom{r}{2} \sum_{i\in[k]} p_i^2 - \dots + (-1)^r \binom{r}{r} \sum_{i\in[k]} p_i^r\right)\\ &= r-k + \sum_{i\in[k]}\left(\sum_{j=0}^r (-1)^j \binom{r}{j}p_i^j\right)\\ &= r-k + \sum_{i\in[k]}\left(1-p_i\right)^r \end{align*}

Rahul Madhavan
  • 2,939
  • 1
  • 12
  • 16