0

It is the problem 1.2.3 of Karlin's book An introduction to stochastic modeling:

A population having $N$ distinct elements is sampled with replacement. Because of repetitions a sample of size r can have fewer than r different elements. Let $S_r$ be the sample size necessary to have r different elements. Show that

$$\Bbb E[S_r]=N\left(\frac1N+\frac{1}{N-1}+...+\frac{1}{N-r+1}\right)$$

But I have a different solution. I did

$$\Pr[S_r=r]=\frac NN\frac{N-1}{N}\dots\frac{N-r+1}{N}=\frac{(N)_r}{N^r}$$

$$\Pr[S_r=r+1]=\frac{(N)_r}{N^r}\frac rN$$

And in general

$$\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\left(\frac rN\right)^k$$

Now

$$\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]$$

Doing some algebra I get the solution

$$\Bbb E[S_r]=\sum_{k\ge0}(r+k)\Pr[S_r=r+k]=\frac{(N)_r}{N^r}\frac{(N-r+1)Nr}{(N-r)^2}$$

This expression dont seem equivalent to the textbook solution. Can someone show me my mistake or the way to get the solution of the book? Thank you in advance.


EDITION: I see a weird mistake. I assumed that for $S_r>r$ the others values may repeat... but this is not true, they can be different too. Anyway this dont seem going in the good direction cause it complicate a lot the expression. Anyway I will see what I get fixing that.

2 Answers2

2

Re-thinking the problem and seeing the "population" and "sample" as a dice of $N$ sides and $S_r$ throws respectively I get the correct answer.

Then the reformulated problem is: what is the expected number of throws to get r different results for a dice of $N$ sides?

This is some kind of Markov chain that I solved some time ago. We thrown a die a undetermined number of times (the quantity doesnt matter) and we have seen $d$ different faces of the die, so for the next throw the probability to see a new face (the probability to change from state $d$ to state $d+1$) is $p=(N-d)/N$, and the probability that this doesnt happen is $q=1-p$.

Hence the expected value of the number of throws to see a new face of the die (i.e. to change from state $d$ to state $d+1$) is

$$\Bbb E[d\to d+1]=\sum_{k\ge 1} kq^{k-1}p=p\sum_{k\ge 1}kq^{k-1}=\\=p\frac{\rm d}{{\rm d}k}\sum_{k\ge 0}q^k=p\frac{\rm d}{{\rm d}k}\frac{1}{1-q}=\frac{p}{(1-q)^2}=\frac1p=\frac{N}{N-d}$$

Then the expected number of throws, i.e. $\Bbb E[S_r]$, will be the sum of the expected number of throws for each consecutive change of state

$$\Bbb E[S_r]=\sum_{d=0}^{r-1}\frac{N}{N-d}=N\left(\frac1N+...+\frac{1}{N-r+1}\right)=N(H_N-H_{N-r})$$

being $H_k$ the $k$-th harmonic number.

0

$S_r$ is the size of selection to have at least one of $r$ distinct elements. This requires that the last element sampled is distinct.

We need to find the probability that: one from $r$ of the $N$ elements is selected last, and exactly $r-1$ distinct items have been selected in a sample of size $r+k-1$ from a population of $N-1$ (each at least once).

$$\begin{align} \mathsf P(S_r=r+k) &\;=\;\binom{N}{r}\frac r N\binom{N-1}{r-1}\frac{(r+k-1)^{r-1}-(r+k-1)^{r-2}+\cdots(r+k-1)^1}{(r+k-1)^{N-1}} \end{align}$$

Graham Kemp
  • 133,231