0

Consider the following problem: every second we receive a random number from the set $ A = \{1, \ldots, n\} $. We stop when we have received all $ n $ numbers at least once. We want to know the expected value of the time it will take to receive all $ n $ numbers.

This can be modeled with the following algorithm:

ALGO(n)
1: cnt = 0
2: S = ∅
3: while |S| < n
4:     j = RANDOM(1, n)
5:     S = S ∪ {j}
6:     cnt = cnt + 1
7: return cnt

Where RANDOM(1, n) returns a random value between 1 and $ n $ inclusive with uniform probability. Let's define the following random variables and the following statistical lemma:

  • $ X $: Number of executions of line 4.
  • $ X_i $: Number of executions of line 4 when $ |S| = i $.

Lemma: Given an experiment with a success probability $ p > 0 $, if we repeatedly perform the experiment independently, the expected number of attempts to get a success for the first time is $ 1/p $.

How can we determine the expected value of $ X $, the total number of executions of line 4 until all $ n $ numbers have been received at least once?

The answer I got was $ n \ln(n) $, but I'm not sure if that's correct.

RobPratt
  • 50,938
  • Given that once you have $n-1$ numbers, the chance that the next number will be new is $\frac1n$ (and thus the expected wait time will be $n$), I expect that the $n\log n$ is a bit of an underestimation... – 5xum Jun 08 '24 at 06:07
  • 2
    This is the coupon collector problem. Expected value is $n H_n=n(1+1/2+...+1/n)$ so a bit larger. – kodlu Jun 08 '24 at 06:10
  • 1
    https://en.wikipedia.org/wiki/Coupon_collector%27s_problem There are many questions on the site if you search for coupon collector. – Ross Millikan Jun 08 '24 at 20:25

1 Answers1

0

I leave a solution to this problem: consider the r.v. $X_k$ who denotes the quantity of differents numbers received in the second $k$. Obviously $X_0 = 0$, and we can see that $(X_k)_{k\in\mathbb{N}}$ is a Markov Chain and $$\mathbb{P}(X_{k+1}=i+1|X_k=i) = \frac{n-i}{n}, \mathbb{P}(X_{k+1}=i|X_k=i) = \frac{i}{n}$$ and $\mathbb{P}(X_{k+1}=j|X_k=i) = 0$ when $j\notin\{i,i+1\}$. We define the stopping time $T_i := \inf\{k\geq 0 : X_k = i\}$. In this way, we would like to calculate $\mathbb{E}(T_n)$. We observe that for all $i<n$ and $\ell\geq 0$ \begin{equation} \begin{aligned} \mathbb{E}(T_{i+1}|T_i = \ell) = \sum_{j> \ell} j\mathbb{P}(T_{i+1} = j|T_i=\ell) &= \sum_{j> \ell}j\mathbb{P}(X_j = i+1,X_{j-1} = i, ..., X_{\ell} = i|X_{\ell} = i)\\ &=\sum_{j>\ell} j \cdot \frac{n-i}{n}\cdot \left(\frac{i}{n}\right)^{j-\ell-1}\\ &= \ell + 1 + \frac{\frac{i}{n}}{1-\frac{i}{n}} \end{aligned} \end{equation} we conclude that $\mathbb{E}(T_{i+1}|T_i) = T_i + 1 + \frac{i}{n-i}$ for all $i<n$. Therefore, by taking expectation one conclude that $\mathbb{E}(T_{i+1}) = \mathbb{E}(T_i) + \frac{n}{n-i}$, then summing up these equalities from $i=0$ to $i=n-1$ we have $\mathbb{E}(T_n) = \mathbb{E}(T_0) + \sum_{i=0}^{n-1}\frac{n}{n-i} = n\sum_{i=1}^n\frac{1}{i}$.

juancodmw
  • 652