If I generate uniform random integers from 1 to K and count how many unique numbers I get $n_\mathrm{unique}$, I empirically obtain:
- the mean is: $\frac{2K}{\pi}$
- the variance is $\frac{K}{\pi^{2}}$.
Matching these with the mean and variance formulas of a binomial distribution, I get:
$$p = 1 - 1 / (2 \pi)$$ $$n=\left\lfloor \frac{2K}{\pi-\frac{1}{2}}\right\rfloor$$
This works, but is not perfect.
Is there a way to derive the probability distribution of $n_\mathrm{unique}$?
Attempt
The first integer is always unique and added to the training set. The second has a probability 1/\left(K-1\right) to be added, and a probability 1/K of not being added.
Therefore probability is $$P_{2}(n)=\begin{cases}1/K & n=1\\(K-1)/K & n=2\end{cases}$$
Subsequently, we have two cases, either in iteration j nothing is added, which happens with probability ($n_{j}+1)/K$, where $n_{j}$ is the size of the set so far, or something is added, which happens with probability $$1-(n_{j}+1)/K=(K-n_{j}+1)/K$$.
Let's then consider the probability of obtaining k unique numbers: $$P(X=k)=\frac{K}{K}\times\frac{K-1}{K}\times\frac{K-2}{K}\times...\times\frac{K-k+1}{K}$$
This is wrong or at least incomplete.
Related problems
This is related to the generalised birthday problem and the number of expected collisions, previously discussed in these posts, which however obtain different formulas, possibly because they focus on the expected number of people with shared birthdays, not the number of shared birthdays:
- birthday problem - expected number of collisions
- Probability distribution for collisions / birthday problem
- birthday problem - which solution for expected value of collisions is correct?
- Birthday Problem: Big Numbers and Distribution of the Number of Samples involved in Collisions
The coupon's collector problem seems related, but this has the number of trials unbounded. I am essentially looking for the coupon's collector's success after n trials of collecting n coupons.