5

it's been a long time since I've dealt with probability so I thought I would ask here. I'm sampling elements independently and uniformly and with repetition from a population. Given that the population is of size n, how many tries (in expectation) would it take me to gather x unique elements?

Thank you :)

3 Answers3

3

If we have $n$ things we are choosing from, see that we always have $1$ unique element after the first draw. From here, we now are dealing with a Geometric Distribution with probability of success being $\dfrac{n-1}{n}$. The expected number of tries here is $\dfrac{n}{n-1}$. Thus the expected number of draws until you get $2$ unique elements from a pool of size $n$ is $$E(1)+E(2) = 1+\dfrac{n}{n-1}$$

Where $E(m) = \dfrac{n}{n-m+1}$ is the expected number of draws after finding the $(m-1)^{th}$ unique element until you've successfully found the $m^{th}$ unique element. We take $E(1) = \dfrac{n}{n-1+1} = 1$ to be the expected number of draws until the first unique element is found, which is just the first draw.

This pattern will generalize, with the expected value of draws until you have $x \le n$ unique elements is $$\sum_{i=1}^x E(i) = E(1)+E(2)+\dots+E(x)$$

WaveX
  • 5,580
1

If you have already collected $k$ unique items, the probability that the next item drawn will be different is ${n-k\over n}.$ We have a geometric distribution, so the expected number of draws until we get a different item is ${n\over n-k}.$ The expected number of draws until we get $x$ different items is $$\sum_{k=0}^{x-1}{n\over n-k}=n\sum_{k=0}^{x-1}{1\over n-k}$$

saulspatz
  • 53,824
  • 1
    I think there's something wrong with what's on top: the $n-x+1$. If you plug in $x=1$, your sum simplifies to $$\sum_{k=0}^{n}{n\over n-k}$$ and we ultimately have a $\frac{n}{n-n}$ term – WaveX Mar 28 '19 at 21:33
  • @WaveX I got it backwards somehow. Thanks. – saulspatz Mar 29 '19 at 13:08
0

Not 100% sure this is the right solution, but I think you basically have to use a sequence of geometric random variables (https://en.wikipedia.org/wiki/Geometric_distribution)

So let's suppose $n$ = 10. If $x = 1$, then we are interested in the number of trials to draw one unique value. Hopefully we don't need to demonstrate that you only need one trial to do this.

Now, if $x = 2$, then we need the number of trials to draw one unique value ($y_1$), and then the number of trials to draw another unique value $(y_2)$. The first success happens on the first trial with probability 1, and then the probability that it takes $k$ more trials to get the second success is given by $P(y_2 = k) = (1 - 9/10)^k(9/10)$, since we will draw a second unique value with probability 9/10.

Notice that each trial is independent here, and the only thing that changes from trial to trial is the success probability, is $1$ for the first trial, $(n-1)/n$ for all trials until we get our second success, and so on. So basically we have $x$ geometric random variables here, and we want to find the expectation of their sum.

For $i \in 1, \dots, x$, let $y_i$ denote the event that we have drawn a unique value. Because $y_i$ is a Bernoulli random variable with success probability $p_i = (n-i+1)/n$, we have

\begin{align} E[\text{Number of trials}] &= \sum_{i = 1}^x E[y_i]\\ &= \sum_{i = 1}^x\frac{1}{p_i}\\ &= \sum_{i = 1}^x\frac{n}{n-i+1} \end{align}

Bayesic
  • 139