18

I want to understand something about the derivation of $\text{Var}(X) = E[X^2] - (E[X])^2$

Variance is defined as the expected squared difference between a random variable and the mean (expected value): $\text{Var}(X) = E[(X - \mu)^2]$

Then:

$\operatorname{Var}(X) = E[(X - \mu)^2]$

$\operatorname{Var}(X) = E[(X - E[X])^2]$

$\operatorname{Var}(X) = E[(X - E[X])(X - E[X])]$

$\operatorname{Var}(X) = E[X^2 - 2XE[X] + (E[X])^2]$

$\operatorname{Var}(X) = E[X^2] - 2E[XE[X]] + E[(E[X])^2]$

$\operatorname{Var}(X) = E[X^2] - 2E[E[X]E[X]] + E[(E[X])^2]$

$\operatorname{Var}(X) = E[X^2] - 2(E[X])^2 + (E[X])^2$

$\operatorname{Var}(X) = E[X^2] - (E[X])^2$

What I don't quite understand is the steps that get us from $E[XE[X]]$ to $E[E[X]E[X]]$ to $(E[X])^2$, also $E[(E[X])^2]$ to $(E[X])^2$.

While I'm sure these jumps are intuitive and obvious I would still like to understand how we can (more formally) make these jumps / consider them mathematically equivalent.

Eric Wofsey
  • 342,377
  • 6
    E[c] = c when c is a constant. E[X] is a constant itself, so E[E[X]] = E[X]. – o0BlueBeast0o Jul 18 '16 at 20:37
  • 1
    I prefer to think of it as $E[X~E[X]] = E[X]\cdot E[X]$. The expectation operator $E[~]$ is linear, so $E[X+Y] = E[X]+E[Y]$. Also, $E[\alpha X] = \alpha E[X]$ for constant $\alpha$. As $E[X]$ is a constant, the constant can be pulled out of $E[X~E[X]]$ – JMoravitz Jul 18 '16 at 20:39
  • Isn't anything a constant assuming we know the answer on the righthand side of an equation? – user6596353 Jul 18 '16 at 20:42
  • And how can we prove that $E[c] = c$ for constant $c$? – user6596353 Jul 18 '16 at 20:43
  • No... if the word "constant" is confusing you, think of it as "$E[X]$ is a 'number.'" $X$ is a function (a random variable), $E[X]$ is simply a number. As for your very last comment... that follows directly from the definition of expectation. How was $E[\cdot]$ defined/introduced to you? – Clement C. Jul 18 '16 at 20:43
  • I always assumed $X$ was a "random variable," as in "if we were to pluck a number at random from the distribution of possible values", or is that sort of a function in itself? – user6596353 Jul 18 '16 at 20:44
  • @ClementC. It was not introduced to me as I am just a self-taught 30-something at this point. I just know it roughly as "the average over infinitely many trials / the weighted sum of (value x probability of that value) over all values". I don't have a formal understanding of expectation or variance or anything. – user6596353 Jul 18 '16 at 20:46
  • The (standard) way to formally define a random variable is as a function from a probability space to a measurable set. (Without actual definitions, it's nigh-impossible to do anything rigorous: "if we were to pluck a number at random from the distribution of possible values" does not quite match the level of rigor needed.) – Clement C. Jul 18 '16 at 20:47
  • If you want an explanation of "$E[c]=c$ for constant $c$" without going through the measure-theoretic definition of probabilities (though you ought to look into that), see it that way: $c$ is itself a random variable that that takes a specific value (namely... $c$) with probability one. Sum over all outcomes with the corresponding weights: you get $1\cdot c$. – Clement C. Jul 18 '16 at 20:49
  • @ClementC. Yeah, that's the problem I've been encountering. Whenever I try to understand what something is doing, suddenly boom, all these abstract concepts that are hard for me to intuit. – user6596353 Jul 18 '16 at 20:50
  • But then in order to understand the abstract concepts, I can only do it in terms of real-life examples... which those formulisms exist to explain in the first place. Reminds me of that Feynman talk about electromagnetism and rubberbands. – user6596353 Jul 18 '16 at 20:52
  • 1
    If the random variable $W$ is the amount you get from one play of a gambling game, then $E(W)$ is the average amount you get. If $W$ is constant, say $c$, then every time you get $c$, so on average you get $c$. – André Nicolas Jul 18 '16 at 20:55
  • You will find everything quite a bit easier (to understand and to type) if for the calculation you replace $E[X]$ by the letter $\mu$. – André Nicolas Jul 18 '16 at 21:02
  • @AndréNicolas But then if I want to show that (for example) Var(aX) = a^2 * Var(X), wouldn't it make more sense to use the function because Var(aX) = E((aX - E(aX))^2)? – user6596353 Jul 18 '16 at 21:40
  • Doesn't make much difference, if $Y=aX$ then the variance of $Y$ is $E(a^2X^2)-a^2\mu^2$. – André Nicolas Jul 18 '16 at 21:59

3 Answers3

18

$\newcommand{\E}{\operatorname{E}}$It should not have been written as $$ \E[X\E[X]] = \E[\E[X]\E[X]]. $$ Instead, it should have said $$ \E[X\E[X]] = \E[X] \E[X]. $$ The justification is this: $$ \E[X\cdot5] = 5\E[X], $$ and similarly for any other constant besides $5$. And in this context, "constant" means "not random". So just treat $\E[X]$ the same way you treat $5$, because it's a constant.

1

We start from the fundamental definition: $$Var(X)=E[(x-\mu)^2]$$ $$Var(X)=E[x^2-2\mu x+\mu^2]$$ $$Var(X)=E[x^2]-E[\mu(2x-\mu)]$$ Because mu is just a constant, we can take it out. $$Var(X)=E[x^2]-\mu*E[2x-\mu]$$ $$Var(X)=E[x^2]-\mu*(E[2x]-E[\mu])$$ $$Var(X)=E[x^2]-u*(2u-u)$$ $$Var(X)=E[x^2]-u*u$$ After Everything, we derive the end result: $$Var(X)=E[x^2]-E[x]^2$$

0

For those interested in a different type of proof:

Suppose we sampled some data $X = x_1, x_2, ..., x_n$ from some Gaussian distribution. Then our sample mean which I will denote as $E[X]$ is:

$ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~E[X] = \frac{1}{m}\sum_1^m x_i^2 $

Our sample variance is // I'm assuming you are familiar with your variance equation :)

$ ~~~~\frac{1}{m}\sum_1^m (x_i-E[X])^2 \\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{1}{m}\sum_1^m (x_i^2-2x_iE[X] + E[X]^2) \\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~= \frac{1}{m}\sum_1^m (x_i^2)- 2E[X]\frac{1}{m}\sum_1^m (x_i) + \frac{1}{m}\sum_1^m(E[X]^2) \\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=E[X^2] - 2E[X]^2+E[X]^2 \\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~=E[X^2]-E[X]^2 $