0

A total of $r$ keys are to be put, one at a time, in $k$ boxes, with each key independently being put in box $i$ with probability $p_i$, where $\sum_{i=1}^k p_i =1 $. Each time a key is put in a nonempty box, we say that a collision occurs. Find the expected number of collisions.

The 2nd answer by @drhab in this link here (what's after the edit) conflicts with my understanding of the problem. Expected number of collisions. Specifically, I dont understand how $X_i = Y_i-1+ \mathbb{I}_{Y_i=0}$ represents the collision in a box, which would be $\max(0,Y_i-1)$. For $Y_i$ being the number of keys in the box.

Originally I thought the collision in box $i$ is $X_i=(E(key|i)-1) \mathbb{I}(key|i>0)$. I think what's wrong with this logic is that $E(key|i)$ takes $-1$ into the mean, but I'd also like to confirm that this deduction is correct.

Golden_Ratio
  • 12,834
smaillis
  • 580
  • I think the reason that you see a conflict is that the linked problem has $k$ items going into $k$ boxes, but your problem has $r$ items going into $k$ boxes. That said, the basic idea still stands; each empty box implies exactly one collision... but additionally there are $r-k$ additional (or fewer) collisions. That would seem to imply that $E(collisions) -r +k = E(empty boxes)$. And I feel like finding $E(empty boxes)$ should be the simplest route. It might not be though. – Eric Snyder Jan 25 '22 at 08:06
  • Are you sure that your link is okay? Because you mention me I would rather expect a link to this question. – drhab Jan 25 '22 at 09:39
  • Did you downvote my answer? If not then please forgive me for asking. If you did then please tell me why. – drhab Jan 25 '22 at 10:05
  • Do you agree that $X_i=\max(0,Y_i-1)$? If so then what exactly is your problem? It is quite easy to verify that: $$\max(0,Y_i-1)=Y_i-1+1_{Y_i=0}$$Discern the cases $Y_i=0$ (both sides give value $0$ then) and $Y_i\geq1$ (both sides give value $Y_i-1$). The advantage of LHS is that it allows you to find the expectation by means of linearity of expectation. – drhab Jan 25 '22 at 10:27
  • @drhab no i haven't downvoted you. I'm able to find equivalency by writing out the expectation as an explicit sum of all the cases but your further explaination still makes no sense to me. – smaillis Jan 25 '22 at 23:49
  • Good to hear that and sorry for my suspicion again. How about the link that you provide. Was it not meant to be a link to a spot where persons can find my answer (you referred to me)? It is not though. – drhab Jan 26 '22 at 07:08

2 Answers2

2

Addendum-2 added to respond to the comment of drhab


I am not really qualified to critique your work. However, it seems to me that there is an alternative approach, that seemed to have (also) been overlooked by the article that you referenced.

You can focus on the expected number of collisions in each box. By linearity of expectation, it is irrelevant that the number of collisions in each box are not independent events.

With $p_i$ equal to the chance that a key went into box $(i)$, let $q_i = 1 - p_i$. Let $E(i)$ denote the expected number of collisions in box $(i)$. Then, the overall expected number of collisions is

$$\sum_{i=1}^{k} E(i).$$

So, the problem reduces to computing $E(i)$.

The number of possible keys that went into box $i$ is some element in $a \in \{0,1,2,\cdots,r\}.$ So, if $g(a) = ~$ the probability that $a$ keys went into box $(i)$, then

$$E(i) = \sum_{a=2}^r (a-1)g(a).$$

However, $\displaystyle g(a) = \binom{r}{a} p_i^a q_i^{(r-a)}.$

Therefore the final computation is

$$\sum_{i = 1}^k \left[ ~\sum_{a=2}^r (a-1)\binom{r}{a}p_i^a q_i^{(r-a)} ~\right].$$


Addendum
I originally went off the rails. I am leaving my original wrong answer in the Addendum, for a reference.

The reason that the answer below is wrong is that if (for example) there are already $2$ keys in a box, and a $3$rd key is added to that box, that counts as $1$ extra collision rather than $2$ extra collisions.


Instead of focusing on the boxes, focus on the keys.

There are $\binom{r}{2}$ possible key-collisions. By linearity of expectation, it is irrelevant that the $\binom{r}{2}$ events are not independent.

The chance of any one of those key-collisions occurring equals the sum of the probabilities that two keys went into the same box.

Therefore, the expected number of collisions is

$$\binom{r}{2} \times \left[\sum_{i=1}^k \left(p_i\right)^2\right].$$


Addendum-2
Response to the comment of drhab.

$E(i) = rp_i - 1 + (1 - p_i)r.$

I was able to confirm this. Analysis below.

Given that $r \in \Bbb{Z_{\geq 2}}.$

Let $p$ denote $p_i$.

Let $\displaystyle S = (1 - p)^r + \binom{r}{1}p - 1.$

Let $\displaystyle T = \sum_{a=2}^r (a-1) \binom{r}{a} p^a (1-p){(r-a)}.$

To Prove: $S = T$.


$\underline{\text{Preliminary Results}}$

$$ \sum_{a=0}^n \binom{n}{a} (-1)^a = 0. \tag{R-1} $$

Proof:

Binomial expansion on $\displaystyle ~ 0^n = [1 + (-1)]^n$.


$$ \sum_{a=1}^{n-1} \binom{n}{a} (-1)^a = (-1) + (-1)^{(n-1)}. \tag{R-2}$$

Proof:

Using (R-1),

$~\displaystyle \sum_{a=1}^{n-1} \binom{n}{a} (-1)^a = 0 - \left[(-1)^0 \binom{n}{0}\right] - \left[(-1)^n \binom{n}{n}\right] = (-1) - (-1)^n.$


$$ \sum_{a=2}^{n} \binom{n}{a} (-1)^a = (n-1). \tag{R-3}$$

Proof:

Using (R-1),

$~\displaystyle \sum_{a=2}^{n} \binom{n}{a} (-1)^a = 0 - \left[(-1)^0 \binom{n}{0}\right] - \left[(-1)^1 \binom{n}{1}\right] = (-1) + n.$


$$ \sum_{a=2}^{n} \binom{n}{a} (-1)^{(n-a)} = (-1)^n (n-1). \tag{R-4}$$

Proof:

$\displaystyle (-1)^{(n-a)} = (-1)^n \times (-1)^{(-a)} = (-1)^n \times (-1)^a.$

$~\displaystyle \sum_{a=2}^{n} \binom{n}{a} (-1)^{(n-a)} = (-1)^n \times \sum_{a=2}^{n} \binom{n}{a} (-1)^a.$

Using (R-3), this equals $~\displaystyle (-1)^n \times (n-1).$


$$ \sum_{a=2}^{n} \binom{n-1}{a-1} (-1)^{(n-a)} = (-1)^n. \tag{R-5}$$

Proof:

$\displaystyle (-1)^{(n-a)} = (-1)^n \times (-1)^{(-a)} = (-1)^n \times (-1)^a.$

$~\displaystyle \sum_{a=2}^{n} \binom{n-1}{a-1} (-1)^{(n-a)} = (-1)^n \times \sum_{a=2}^{n} \binom{n-1}{a-1} (-1)^a.$

$\displaystyle = (-1)^n \times \sum_{a=1}^{n-1} \binom{n-1}{a} (-1)^{a+1}$

$\displaystyle = (-1)^{(n+1)} \times \sum_{a=1}^{n-1} \binom{n-1}{a} (-1)^a$.

$\displaystyle = (-1)^{(n+1)} \times \left\{ ~\left[ \sum_{a = 0}^{(n-1)} \binom{n-1}{a} (-1)^a \right] - \binom{n-1}{0}(-1)^0 ~\right\}$

$\displaystyle = -1^{(n+1)} \times \{0 - 1\} = (-1)^n.$


$$ (1 - p)^n = \sum_{a=0}^{n} \binom{n}{a} (-1)^a p^a. \tag{R-6}$$

Proof:

Immediate, by binomial expansion.


$$ \text{For} ~r \in \Bbb{Z_{\geq 2}}, ~j \in \{2,3,\cdots, r\}, ~a \in \{2,3,\cdots, j\}, $$

$$~\binom{r}{a} \times \binom{r-a}{J-a} = \binom{r}{J} \times \binom{J}{a}. \tag{R-7} $$

Proof:

$\displaystyle \frac{r!}{a! ~(r-a)!} \times \frac{(r-a)!}{(J-a)! ~(r-J)!} ~=~ \frac{r!}{a! ~(J-a)! ~(r-J)!} $

$\displaystyle = \frac{r!}{J! ~(r-J)!} \times \frac{J!}{a! ~(J-a)!}.$


$$~a \times \binom{r}{a} \times \binom{r-a}{J-a} = J \times \binom{r}{J} \times \binom{J-1}{a-1}. \tag{R-8} $$

Proof:

$\displaystyle a \times \frac{r!}{a! ~(r-a)!} \times \frac{(r-a)!}{(J-a)! ~(r-J)!} ~=~ \frac{r!}{(a-1)! ~(J-a)! ~(r-J)!} $

$\displaystyle = \frac{r!}{J! ~(r-J)!} \times \frac{J!}{(a-1)! ~(J-a)!} = \binom{r}{J} \times J \times \binom{J-1}{a-1}.$


$$ \sum_{a=2}^r \binom{r}{a} p^a (1-p)^{(r-a)} = \sum_{J=2}^r (J-1) \times \binom{r}{J} ~(-1)^J ~p^J. \tag{R-9} $$

Proof:

Each term in $\displaystyle ~\sum_{a=2}^r \binom{r}{a} p^a (1-p)^{(r-a)}~$ will have a factor of
$~\displaystyle p^J ~$ in it, where $~J~$ is some element in $~\{2, 3, \cdots, r\}.$

Therefore $\displaystyle ~\sum_{a=2}^r \binom{r}{a} p^a (1-p)^{(r-a)}~$ can be expressed as

$\displaystyle \sum_{J=2}^r p^J \times f(J),~$ where $~f(J)~$ needs to be computed.

Using (R-6), for $~J \in \{2,3,\cdots,r\},~$ and $~a \in \{2,3,\cdots,J\}$,
the $~\displaystyle p^J~$ term in $~\displaystyle \binom{r}{a} p^a (1-p)^{(r-a)}$

is given by

$\displaystyle \binom{r}{a} p^a \times \binom{r-a}{J-a} p^{(J-a)} (-1)^{(J-a)}.$

Using (R-7), this equals

$\displaystyle p^J \times \binom{r}{J} \times \binom{J}{a} (-1)^{(J-a)}.$

Therefore,

$\displaystyle \sum_{J=2}^r p^J \times f(J) = \sum_{J=2}^r \left\{ ~\binom{r}{J} \times p^J \times \left[\sum_{a=2}^J \binom{J}{a} (-1)^{(J-a)}\right] ~\right\}.$

Using (R-4), this equals

$\displaystyle \sum_{J=2}^r \binom{r}{J} \times p^J \times (J-1) \times (-1)^J.$


$$ \sum_{a=2}^r a \times \binom{r}{a} p^a (1-p)^{(r-a)} = \sum_{J=2}^r J \times \binom{r}{J} ~(-1)^J ~p^J. \tag{R-10} $$

Proof:

Analysis very similar to (R-9).

$\displaystyle ~\sum_{a=2}^r a \times \binom{r}{a} p^a (1-p)^{(r-a)}~$ will equal

$\displaystyle \sum_{J=2}^r p^J \times g(J),~$ where $~g(J)~$ needs to be computed.

Using (R-6), for $~J \in \{2,3,\cdots,r\},~$ and $~a \in \{2,3,\cdots,J\}$,
the $~\displaystyle p^J~$ term in $~ \displaystyle a \times \binom{r}{a} p^a (1-p)^{(r-a)}$

is given by

$\displaystyle a \times \binom{r}{a} p^a \times \binom{r-a}{J-a} p^{(J-a)} (-1)^{(J-a)}.$

Using (R-8), this equals

$\displaystyle p^J \times J \times \binom{r}{J} \times \binom{J-1}{a-1} (-1)^{(J-a)}.$

Therefore,

$\displaystyle \sum_{J=2}^r p^J \times g(J) = \sum_{J=2}^r \left\{ ~J \times \binom{r}{J} \times p^J \times \left[\sum_{a=2}^J \binom{J-1}{a-1} (-1)^{(J-a)}\right] ~\right\}.$

Using (R-5), this equals

$\displaystyle \sum_{J=2}^r J \times \binom{r}{J} \times p^J \times (-1)^J.$




Let $~\displaystyle S = rp - 1 + (1-p)^r.$

Let $~\displaystyle T = \sum_{a=2}^r (a-1) \times \binom{r}{a} \times p^a \times (1-p)^{(r-a)}.$

To Prove: $~ S = T$.

Using (R-6),

$\displaystyle S = \left[\sum_{i=0}^r \binom{r}{i} ~(-1)^i ~p^i\right] - \left[\sum_{i=0}^1 \binom{r}{i} ~(-1)^i ~p^i\right]$

$\displaystyle S = \sum_{i=2}^r \binom{r}{i} ~(-1)^i ~p^i.$

Let $~\displaystyle T_1 = \sum_{a=2}^r a \times \binom{r}{a} \times p^a \times (1-p)^{(r-a)}.$

Let $~\displaystyle T_2 = \sum_{a=2}^r \binom{r}{a} \times p^a \times (1-p)^{(r-a)}.$

Then $T = T_1 - T_2.$

By (R-10),

$\displaystyle T_1 = \sum_{J=2}^r p^J \times \binom{r}{J} \times (-1)^J \times (J).$

By (R-9),

$\displaystyle T_2 = \sum_{J=2}^r p^J \times \binom{r}{J} \times (-1)^J \times (J-1).$

Therefore,

$\displaystyle T_1 - T_2 = \sum_{J=2}^r p^J \times \binom{r}{J} \times (-1)^J = S.$

user2661923
  • 42,303
  • 3
  • 21
  • 46
1

I'm going to expand my comment above into an answer. Let's take a pair of ridiculously simple examples. First, consider $k=20, r=15$. Instead or playing them randomly, we're going to put them in evenly. In this case, we end up with $5$ empty boxes--there aren't enough keys to put one in every box. Now, move a key to a box with another key. Now we have one collision and one more empty box. You can do this as long as you like; we will always have $B_0-C= 5$.

($B_0$ is the number of empty boxes, $C$ is the number of collisions.)

Now consider $k=12, r=15.$ This time, spaced out evenly, we can't have fewer than three collisions. Remove a key from one of the boxes with only one key, drop it in another box, and we see the same thing: $B_0-C$ stays constant, though here the difference is $-3$.

So we can safely say $B_0-C=k-r$ for all possible $k$ and $r$. Since that must always hold at the end of each random process, it should remain true for the average of those processes. $k$ and $r$ are constants, therefore

$$E(C) = r-k+E(B_0)$$

We need to find $E(B_0)$. How? Well, each box has an independent probability, so we can just find $E_i(em)$ for each box, and add those together. That's easy enough: $ E_i(em) = (1 - p_i)^r$.

Then for all the boxes, we have

$$E(B_0) = \sum_{i=1}^k (1 - p_i)^r$$

$$E(C)=r-k+ \sum_{i=1}^k (1 - p_i)^r$$

We can do a sanity check with extreme conditions. If all but one box has $p_i=0$, and the last box has $p_i=1$, then our sum is $k-1$. We had $r-1$ collisions in the last box--the first key doesn't count-- we get $r-1=r-k+k-1$, which is correct.

Using $k=r=10$ and $p=0.1$ for all the boxes, we get about $3.5$ expected collisions, which seems like a good estimate to me, though that's intuition.

So I'll stand by my answer, though obviously there may be holes. I hope it helps!

Eric Snyder
  • 3,147
  • 1
    Your outcome agrees with the answers on this question among which mine. The OP refers to it but the provided link is not to that question. Also note that about 3/4 hours ago my answer was downvoted by someone who did not explain why. – drhab Jan 25 '22 at 09:57
  • drhab Hm, yes, the link provided by OP doesn't have an answer by you! Glad that my result agrees with that answer. No idea why anyone would have downvoted it... alas, one of those things the system is really opaque about. – Eric Snyder Jan 29 '22 at 23:14
  • Well, anyhow. In the mean time it received 2 upvotes as a rich compensation :-). – drhab Jan 30 '22 at 07:02