3

"Let us assume that X denotes the first pick and Y denotes the second pick from the set of numbers $x_1,x_2,...x_n$. What is the correlation coefficient between X and Y."

I believe that $P(X=x_i)=1/N$ and $P(Y=x_j)= \sum P(Y=x_j|X=x_i)P(X=x_i) = N \frac{1}{N-1}\frac{1}{N}=\frac{1}{N-1}$

We know that $\rho=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(y)}}$ and after some calculations I think we obtained the following:

Cov(X,Y)= $-\frac{1}{N(N-1)}\sum_i^nx_i^2$

I am stuck at this point and couldn't write the denominator in a nice form, and I am not sure about the formula in the numerator.

EDIT: I realised that $P(Y=x_j)= \sum P(Y=x_j|X=x_i)P(X=x_i) = N-1 \frac{1}{N-1}\frac{1}{N}=\frac{1}{N}$ since $P(Y=x_j|X=x_i)= 0$ for $i=j$ . Then lets take one step further and consider the n'th pick. We should not pick $x_j$ until it is the only card remains. Then

$P(Y=x_j)= \frac{N-1}{N}\frac{N-2}{N-1}\frac{N-3}{N-2}... = \frac{1}{N}$

So, we may think that this process has multinomial distribution with n=1 and $p_i=\frac{1}{N}$ $\forall i$. Then

$Cov(X_i,X_j)=-np_ip_j=-\frac{1}{N}\frac{1}{N}$

$Var(X_i)=Var(X_j)=np(1-p)=\frac{1}{N}\frac{N-1}{N}$

$\rho=\frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}}=\frac{-\frac{1}{N}\frac{1}{N}}{\frac{1}{N}\frac{N-1}{N}}=\frac{-1}{N-1}$

Is this reasoning true or am I making a mistake?

  • Hello, welcome to MSE! Please, try to make your title your question and leave out the hypotheses: http://meta.math.stackexchange.com/questions/9959/how-to-ask-a-good-question#10144 – Samuel M. A. Luque Jun 13 '21 at 10:18
  • Please do not change the language of the question after an answer has been posted. You can edit to add new stuff of course , but keep the original. – StubbornAtom Jun 13 '21 at 19:09

1 Answers1

1

Suppose the population mean is

$$\mu=\operatorname E(x_i)=\frac1N\sum_{\alpha=1}^N X_\alpha$$

The population variance is then

$$\sigma^2=\operatorname{Var}(x_i)=\frac1N\sum_{\alpha=1}^N (X_\alpha-\mu)^2$$

Let $(N)_n$ be the number of $n$-permutations of $N$, i.e. $$(N)_n=N(N-1)(N-2)\cdots(N-n+1)$$

For $i\ne j$,

$$P(x_i=X_{\alpha},x_j=X_{\alpha'})=\begin{cases}\frac{(N-2)_{n-2}}{(N)_n}=\frac1{N(N-1)} &,\text{ if }\alpha\ne \alpha' \\ 0&,\text{ if }\alpha= \alpha' \end{cases}$$

Therefore,

\begin{align} \operatorname{Cov}(x_i,x_j)&= E(x_i-\mu)(x_j-\mu) \\&=\sum_{\alpha=1}^N \sum_{\alpha'=1}^N (X_{\alpha}-\mu)(X_{\alpha'}-\mu)P(x_i=X_{\alpha},x_j=X_{\alpha'}) \\&= \frac1{N(N-1)}\sum_{\alpha=1}^N (X_{\alpha}-\mu)\sum_{\alpha'(\ne \alpha)=1}^N (X_{\alpha'}-\mu) \\&=\frac1{N(N-1)}\sum_{\alpha=1}^N (X_{\alpha}-\mu)\left\{\sum_{\alpha'=1}^N (X_{\alpha'}-\mu)-(X_{\alpha}-\mu) \right\} \\&=-\frac1{N(N-1)}\sum_{\alpha=1}^N (X_{\alpha}-\mu)^2 \\&= -\frac{\sigma^2}{N-1} \qquad,\,\, i\ne j \end{align}

Hence the correlation between $x_i$ and $x_j$ for every $i\ne j$ is $-\frac1{N-1}$.

StubbornAtom
  • 17,932