3

I'm trying to compute variance of geometric RV $X$ with parameter p. I would like to use the Law of Total Expectation. RV $Y$ represents the first trial, which is either success with probability $p$ or fail with probability $(1-p)$.

$ \text{Var}(X) = E(X^2) - E(X)^{2} \\ \begin{align} E(X) = E(E(X|Y)) &= pE(X|Y=\text{success}) + (1-p)E(X|Y=\text{fail})\\ &= 1.p + (1-p)(1+E(X)) \\ E(X) &= \frac{1}{p} \end{align} $

The expected number of trials conditioned on the first trial being successful is 1 and the expected number of trials conditioned on the first trial being unsuccessful is $1 + E(X)$, since trials are independent.

The second step would be to compute $E(X^2)$ the same way, which is where I ran into trouble. Plugging in the same way doesn't produce the correct result. Any help would be much appreciated.

$E(X^2) = E(E(X^2|Y)) = pE(X^2|Y=\text{success}) + (1-p)E(X^2|Y=\text{fail})$

Filip
  • 59

3 Answers3

3

Your strategy will work, but it needs some additional thought. The key is to understand what exactly happened when you calculated $\operatorname{E}[X \mid Y = \text{fail}]$, the conditional expectation of $X$ given that the first trial was a failure.

You correctly reasoned that, by the memorylessness property of the geometric distribution, when the first trial is a failure, the expected number of additional trials needed to observe the first success remains the same quantity as if the first trial had not occurred; i.e., $$\operatorname{E}[X \mid Y = \text{fail}] = 1 + \operatorname{E}[X].$$ But what is actually happening is that $$\operatorname{E}[X \mid Y = \text{fail}] = \operatorname{E}[1+X].$$ Since $1$ is constant, it can be pulled out of the expectation operator so this is the same thing, but the subtlety here is that we can think of the conditional variable on the left $X \mid Y = \text{fail}$ as a new random variable, say $W$, that counts the number of total trials (including the first failed trial) needed to observe the first success. This variable has PMF $$\Pr[W = w] = (1-p)^{w-2} p, \quad w \in \{2,3,4, \ldots\}.$$ The reason is because the first trial is, by construction, a failure; but it still counts. So $W$ is at least $2$, as the soonest we could succeed is on the $2^{\rm nd}$ trial. When $W = 2$, this means the second trial was a success, with occurs probability $p$. We don't count the probability of the first trial because it is stipulated to have failed.

But if you look at the form of the PMF of $W$, it is also clear that it is just a location-shifted geometric distribution. Specifically, if $$\Pr[X = x] = (1-p)^{x-1} p, \quad x \in \{1, 2, 3, \ldots \},$$ then replacement of $x$ with $x-1$ gives us $$\Pr[X = x-1] = (1-p)^{x-1-1} p, = (1-p)^{x-2} p, \quad x \in \{2, 3, 4, \ldots \},$$ which is functionally identical to the PMF of $W$. So $$\Pr[W = w] = \Pr[X = w-1] = \Pr[X+1 = w].$$ In other words, we have algebraically shown that we can define $W$ in two ways: $$W = (X \mid Y = \text{fail}),$$ or $$W = X+1.$$ They are the same random variable, so their expectations are the same.

Now you may wonder what this has to do with calculating $$\operatorname{E}[X^2 \mid Y = \text{fail}].$$ The insight is that, from the above reasoning, we should have $$\operatorname{E}[X^2 \mid Y = \text{fail}] = \operatorname{E}[(X+1)^2].$$ And we can see this must be true because, given the first trial was a failure, the additional number of trials needed is still distributed as $X$, so the total number of trials is $X+1$ as we said before; so the square of the total number of trials needed must be $(X+1)^2$.

(When the first trial is a success, then $X = 1$ so $X^2 = 1$.)

Therefore, your computation would be $$\boxed{\begin{align} \operatorname{E}[X^2] &= p \operatorname{E}[X^2 \mid Y = \text{success}] + (1-p)\operatorname{E}[X^2 \mid Y = \text{fail}] \\ &= p \cdot 1 + (1-p) \operatorname{E}[(X+1)^2] \\ &= p + (1-p)\operatorname{E}[X^2 + 2X + 1] \\ &= p + (1-p)\left(\operatorname{E}[X^2] + 2\operatorname{E}[X] + 1\right)\\ &= p + (1-p)\left(\operatorname{E}[X^2] + \frac{2}{p} + 1\right), \end{align}}$$ where in the penultimate step we used the linearity of expectation, and in the final step, we used the earlier result you obtained, $\operatorname{E}[X] = 1/p$. Now all that remains is to solve for $\operatorname{E}[X^2]$, which I leave to you as an exercise.

heropup
  • 143,828
1

Let $X$ be a geometric random variable parameterized by $p$. Furthermore, let $Y$ be a random variable such that $Y=1$ iff a success is obtained on the first trial and $Y=0$ otherwise.

To find $ \text{Var}[X] = \text{E}[X^2] - \text{E}[X]^2 $ using the law of total expectation... recognize that whenever $Y=1$ we have $X=1$, and whenever $Y=0$, the random experiment is still associated with a geometric random variable parameterized by $p$ but with $1$ extra trial. In other words, whenever $Y=0$ we have $X = Z+1$ where $Z$ is a geometric random variable parameterized by $p$.

First, find $\text{E}[X]$ as follows

$ \text{E}[X] = \text{E}[\text{E}[X|Y]] = \sum_{i=0}^1 \text{E}[X|Y=i]P(Y=i) $

$ = \text{E}[X|Y=0]P(Y=0) + \text{E}[X|Y=1]P(Y=1) $

$ = \text{E}[Z+1|Y=0]P(Y=0) + \text{E}[1|Y=1]P(Y=1) $

$ = (\text{E}[Z]+\text{E}[1])(1-p) + 1p = (\text{E}[Z]+1)(1-p) + p$

$ = (\frac{1}{p}+1)(1-p) + p = \frac{1}{p}$

Then find $\text{E}[X^2]$ as follows

$\text{E}[X^2] = \text{E}[\text{E}[X^2|Y]] = \sum_{i=0}^1 \text{E}[X^2|Y=i]P(Y=i)$

$= \text{E}[X^2|Y=0]P(Y=0) + \text{E}[X^2|Y=1]P(Y=1)$

$= \text{E}[(Z+1)^2|Y=0]P(Y=0) + \text{E}[1^2|Y=1]P(Y=1)$

$= \text{E}[Z^2 + 2Z + 1|Y=0](1-p) + 1^2p$

$= (\text{E}[Z^2] + \text{E}[2Z] + \text{E}[1])(1-p) + p$

$= (\text{E}[Z^2] + 2\text{E}[Z] + 1)(1-p) + p$

Now, consider that $X=Z$. Thus,

$\text{E}[X^2] = (\text{E}[X^2] + 2\text{E}[X] + 1)(1-p) + p = (\text{E}[X^2] + 2\frac{1}{p} + 1)(1-p) + p$

Implement basic algebra to isolate $\text{E}[X^2]$ on one side of the equation and obtain

$\text{E}[X^2] = \frac{2-p}{p^2}$

You can also find $\text{E}[Z^2]=\frac{2-p}{p^2}$ by the moment generating function $\text{E}[Z^n]=\frac{d^n}{dt^n} \text{E}[e^{tZ}] |_{t=0}$ where $n=2$. Hence, $\text{E}[Z^2]=\frac{d^2}{dt^2} \text{E}[e^{tZ}] |_{t=0} = \frac{d^2}{dt^2} \frac{pe^t}{1-(1-p)e^t} |_{t=0} = \frac{2-p}{p^2}$.

If you really want to nerd out, then use the definition of expectation

$\text{E}[Z^2] = \sum_{i=1}^\infty i^2 P(Z=i) = \sum_{i=1}^\infty i^2 (1-p)^{i-1}p^i$

$= \frac{1}{1-p} \sum_{i=1}^\infty i^2 (1-p)^ip^i = \frac{1}{1-p} \sum_{i=1}^\infty i^2 [(1-p)p]^i$

This can be solved analytically (but with a lot more work) by letting $a=(1-p)p$ and observing $|a|=|(1-p)p|<1$. Then, by substitution we can find $\sum_{i=1}^\infty i^2 a^i$ as follows: $(1)$ Take the derivative of the $n$th partial sum of the geometric series $\sum_{i=1}^n a^i= \frac{1-a^{n+1}}{1-a} -1$ and multiply it by a factor of $a$, $(2)$ take the derivative once again and multiply it by a factor of $a$, $(3)$ find the limit of this result as $n \to \infty$ (you may have to use L'Hopital's Rule).

1

Well, you could do that, or do this:

We have $X\sim\mathcal{Geo}_1(p)$ as the count of trials until success. This is the count of the first trial, plus any subsequent trials, if the first is a failure, until success.

Let $Y\sim\mathcal{Bern}(p)$ indicate whether the first trial is a success. As such the following facts about this Bernoulli distribution may be useful:

$$\begin{align}\mathsf E(Y)&=p\\\mathsf E(Y^2)&=p\\\mathsf E(1-Y)&=1-p\\\mathsf E((1-Y)^2)&=1-p\end{align}$$

Let $Z\sim\mathcal{Geo}_1(p)$ be the count of subsequent trials until success. This $Z$ is independent from $Y$ and identically distributed to $X$. However, since we do only include this in the count for $X$ when the first trial is a failure, and $Y$ is Bernoulli distributed, then $\boxed{X := 1+(1-Y)Z}$. Thus: $$\begin{align}\mathsf E(X) &= \mathsf E(1+(1-Y)Z)\\ &= 1+(1-p)\mathsf E(X)\\\therefore\quad\mathsf E(X) &=1/p\end{align}$$

As you have found.

Similarly:

$$\begin{align}\mathsf E(X^2) &= \mathsf E((1+(1-Y)Z)^2)\\&=\mathsf E(1+2(1-Y)Z+(1-Y)^2Z^2)\\ &=1+2(1-p)/p+(1-p)\mathsf E(X^2)\\&~~\vdots\end{align}$$

Graham Kemp
  • 133,231