1

In a lottery $n$ numbers are selected from the $N$ numbers $1,2,\cdots,N.$ Find the variance of the sum $S_n$ of the selected numbers.

My idea:

We want to find $P(S_n=k)$. Now, it would be the number of solutions of $$x_1+x_2+\dots+x_n=k$$ Satisfying $x_i\ne x_j$ for $i\neq j$, and $1\le x_i\leq N$ for all $i=1,2,\cdots, n$. But how to find this probability? This seems like a brute idea, there could be better but I still can't solve this equation, and having no better ideas anyways. Can someone help me? Thanks a lot.

shadow10
  • 5,737

3 Answers3

5

In lotteries the numbers $(X_k)_{1\leqslant k\leqslant n}$ are distinct and selected uniformly, thus, for every $k$, $$E(X_k)=\frac1N\sum\limits_{i=1}^Ni=\frac{N+1}2,\quad E(X_k^2)=\frac1N\sum\limits_{i=1}^Ni^2=\frac{(N+1)(2N+1)}6,$$ and, for every $k\ne\ell$, $$E(X_kX_\ell)=\frac1N\sum_{i=1}^Ni\frac1{N-1}\sum_{j\ne i}j=\frac1{N(N-1)}\left(\sum_{i=1}^Ni\right)^2-\frac1{N(N-1)}\sum_{i=1}^Ni^2,$$ that is, $$E(X_kX_\ell)=\frac1{N(N-1)}\left(\frac{N(N+1)}2\right)^2-\frac1{N(N-1)}\frac{N(N+1)(2N+1)}6=\frac{(N+1)(3N+2)}{12}.$$ Thus, $$E(S_n)=nE(X_1)=n\frac{N+1}2,$$ and $$\mathrm{var}(S_n)=nE(X_1^2)+n(n-1)E(X_1X_2)-n^2E(X_1)^2,$$ that is, $$\mathrm{var}(S_n)=n\frac{(N+1)(2N+1)}6+n(n-1)\frac{(N+1)(3N+2)}{12}-n^2\frac{(N+1)^2}4,$$ which can be simplified as $$\mathrm{var}(S_n)=n(N-n)\frac{N+1}{12}.$$

Did
  • 284,245
  • Thanks a lot for answering, but doesn't the distribution of $X_k$ depend on the preceeding $X_i$'s? It will be helpful if you could explain this. Thanks a lot. – shadow10 Nov 03 '14 at 15:54
  • @shadow10 There are "not independent" relationships between $X_k$ and every other $X_i$, not just the "preceding" $X_i$s. That is why we must consider $E(X_kX_\ell)$ for every $k\neq\ell$, as this answer does. – David K Oct 14 '23 at 16:27
2

If $X_1, \ldots, X_n$ are the numbers selected, $S_n = \sum_{k=1}^n X_k$ so $\text{Var}(S_n) = \sum_{i=1}^n \sum_{j=1}^n \text{Cov}(X_i, X_j)$. There are just two cases to consider, $i=j$ (which occurs $n$ times) and $i \ne j$ ($n^2 - n$ times), so $\text{Var}(S_n) = n \text{Var}(X_1) + (n^2-n) \text{Cov}(X_1,X_2)$.

$X_1$ is equally likely to be any of $1,2,\ldots,N$. Write $X_1 = \sum_{i=1}^N i I_{\{X_1 = i\}}$ where $I_E$ denotes the indicator of event $E$ (i.e. $1$ if $E$ occurs, $0$ if not). We have $E[I_{\{X_1=i\}}] =E[I_{\{X_1=i\}}^2]= 1/N$ so $\text{Var}(I_{\{X_1=i\}}) = 1/N - 1/N^2$, while $I_{\{X_1=i\}} I_{\{X_1=j\}} = 0$ for $i \ne j$ so $\text{Cov}(I_{\{X_1=i\}}, I_{\{X_1=j\}}) = - 1/N^2$ for $i \ne j$. Thus $$\text{Var}(X_1) = \sum_{i,j} i j \text{Cov}(I_{\{X_1=i\}}, I_{\{X_1=j\}}) = \sum_{i=1}^N \dfrac{i^2}{N} - \dfrac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N ij = \dfrac{N^2-1}{12}$$ Similarly, $E[I_{\{X_1=i\}} I_{\{X_2=j\}}] = \dfrac{1}{N(N-1)}$ for $i\ne j$, $0$ for $i=j$, so $$\text{Cov}(I_{\{X_1=i\}}, I_{\{X_2=j\}} = \cases{\dfrac{1}{N(N-1)} - \dfrac{1}{N^2} = \dfrac{1}{N^2(N-1)} & for $i \ne j$\cr - \dfrac{1}{N^2} & otherwise}$$ so $$ \text{Cov}(X_1,X_2) = \sum_{i=1}^N \sum_{j=1}^N \dfrac{ij}{N^2(N-1)} - \sum_{i=1}^N \dfrac{i^2}{N(N-1)} = -\dfrac{N+1}{12}$$ and thus $$ \text{Var}(S_n) = n \dfrac{N^2-1}{12} - (n^2-n) \dfrac{N+1}{12} = \dfrac{n (N+1)(N-n)}{12} $$

Robert Israel
  • 470,583
0

If they are chosen independently of each other, then the variance of the sum is the sum of the variances, so it would be $n$ times the variance of the number chosen randomly from among $1,\ldots,N$.

In fact, that is the reason why standard deviations are used as a measure of dispersion, rather than the seemingly simpler and more obvious mean distances. If a number among $1,2,3,4,5$ is chosen at random, each having probability $1/5$ of being chosen, then the average is $3$ and the distances from $3$ are: $$ \begin{align} 1 & \mapsto |1-3|=2 \\ 2 & \mapsto |2-3|=1 \\ 3 & \mapsto |3-3|=0 \\ 4 & \mapsto |4-3|=1 \\ 5 & \mapsto |5-3|=2 \end{align} $$ The average of these is $(2+1+0+1+2)/5=1.2$ Why not use that, rather than the more complicated standard deviation, as a measure of dispersion? The answer is that if you choose a thousand numbers that way and add them up, there is no easy way to find the corresponding quantity for the sum. But with variances, it's easy: just do what I said in the first paragraph above. And from the variance you get the standard deviation.