19

$ \operatorname{Var}(X) = E[X^2] - (E[X])^2 $

I have seen and understand (mathematically) the proof for this. What I want to understand is: intuitively, why is this true? What does this formula tell us? From the formula, we see that if we subtract the square of expected value of x from the expected value of $ x^2 $, we get a measure of dispersion in the data (or in the case of standard deviation, the root of this value gets us a measure of dispersion in the data).

So it seems that there is some linkage between the expected value of $ x^2 $ and $ x $. How do I make sense of this formula? For example, the formula

$$ \sigma^2 = \frac 1n \sum_{i = 1}^n (x_i - \bar{x})^2 $$

makes perfect intuitive sense. It simply gives us the average of squares of deviations from the mean. What does the other formula tell us?

WorldGov
  • 1,037

4 Answers4

19

Some times ago, a professor showed me this right triangle:

enter image description here

The formula you reported can be seen as the application of the Phytagora's theorem:

$$P = \mathbb{E}[X^2] = \text{Var}[X] + \mathbb{E}^2[X].$$

Here, $P = \mathbb{E}[X^2]$ (which is the second uncentered moment of $X$) is read as the "average power" of $X$. Indeed, there is a physical explanation.

In physics, energy and power are related to the "square" of some quantity (i.e. $X$ can be velocity for kinetic energy, current for Joule law, etc.).

Suppose that these quantities are random (indeed, $X$ is a random variable). Then, the average power $P$ is the sum of two contribution:

  1. The square of the expected value of $X$;
  2. Its variance (i.e. how much it varies from the expected value).

It is clear that, if $X$ is not random, then $\text{Var}[X] = 0$ and $\mathbb{E}^2[X] = X^2$, so that:

$$P = X^2,$$

which is a typical physical definition of energy/power (in this case it is exact, it is not an average). When randomness is present, the we must use the whole formula

$$P = \mathbb{E}[X^2] = \text{Var}[X] + \mathbb{E}^2[X]$$

to evaluate the average power of the signal.

As a final remark, the average power of $X$ can be seen as the length of the vector which components corresponds to the square of its expected value plus its variability.


P.S. A further clarification... the values $P$, $\text{Var}[X]$ and $\mathbb{E}^2[X]$ represent the squares of the sides of the triangle, not their length...

the_candyman
  • 14,234
  • 4
  • 37
  • 65
  • 2
    +1, I love this interpretation! I never saw it before. – Sean Roberson Dec 05 '18 at 00:50
  • I wonder what physical meaning the professor would've given for the triangle being right. Why isn't it oblique, or acute? This is just my guess, but it could be justified perhaps by saying that, for a random variable, the mean and the variance are independent of each other. Meaning to say, in an arbitrary distribution, these two measures may vary freely (such as a Gaussian being a two-parameter distribution in which the parameters of mean and standard deviation are completely independent). That would make mean and variance like a two-component vector whose norm is rms value of the variable. – Jonathan Clark Nov 14 '24 at 18:41
7

Easy! Expand by the definition. Variance is the mean squared deviation, i.e., $V(X) = E((X-\mu)^2).$ Now:

$$ (X-\mu)^2 = X^2 - 2X \mu + \mu^2$$

and use the fact that $E(\cdot)$ is a linear function and that $\mu$ (the mean) is a constant.

The shortcut computes the same thing, but counts the difference in the mean of squares and the square of the mean.

Sean Roberson
  • 10,119
  • How can one prove that the expected value is a linear function? – Zacky Dec 04 '18 at 22:57
  • 3
    It follows from writing it as a sum: $$E(kX + Y) = \sum (kxP(X = x) + yP(Y = y)) = k\sum xP(X = x) + \sum yP(Y = y)$$ – Sean Roberson Dec 04 '18 at 22:59
  • 2
    Just to add to this, and take this with a grain of salt since I don't know probability: That this is a good definition for variance follows from wanting to get a sense of the distance you expect values of your random variable to be from the mean, one might naively choose the absolute value, but squaring is better as a smooth operation. – operatorerror Dec 04 '18 at 23:50
7

The other formula tells you exactly the same thing as the one that you have given with $x,x^2$ $\&$ $n$. You say you understand this formula so I assume that you also get that variance is just the average of all the deviations squared.

Now, $\mathbb{E}(X)$ is just the average of of all $x’_is$, which is to say that it is the mean of all $x’_is$.

Let us now define a deviation using the expectation operator. $$Deviation = D = (X-\mathbb{E}(X))$$ And Deviation squared is, $$D^2 = (X-\mathbb{E}(X))^2$$

Now that we have deviation let’s find the variance. Using the above mentioned definition of variance, you should be able to see that

$$Variance = \mathbb{E}(D^2)$$ Since $\mathbb{E}(X)$ is the average value of $X$,The above equation is just the average of deviations squared.

Putting the value of $D^2$, we get, $$Var(X) = \mathbb{E}(X-\mathbb{E}(X))^2 = \mathbb{E}(X^2+\mathbb{E}(X)^2-2X*\mathbb{E}(X)) = \mathbb{E}(X^2)+\mathbb{E}(X)^2-2\mathbb{E}(X)^2 = \mathbb{E}(X^2)-\mathbb{E}(X)^2$$ Hope this helps.

user601297
  • 1,136
0

One intuitive way of measuring the variation of $X$ would be to look at how far, on average, $X$ is from it’s mean, $E(X)=\mu$. That is, we want to compute $E(X-\mu)$. However, mathematically, it’s “inconvenient” to use $E(X-\mu)$, so we use the more convenient $E((X-\mu)^{2}))$.

To add, the formula you gave above, $\frac{1}{n}\sum_{i=1}^{n}(x_{i}-\bar{x})$ is what you would use when you have finite data points. There is nothing random once you have your data points. $Var(X)$ is for a random variable, that can take on finite values, infinite countable values, or values on an interval.

  • I'll just comment that “inconvenient” is a bit of an understatement, as $E(X-μ)$ is always zero, because the expected value is linear: $E(X-μ) = E(X)-E(μ) = μ - μ = 0$. An alternative to squaring that also works is the Mean Absolute Deviation (MAD), which uses the absolute value instead, but also has some shortcomings as compared to the variance — mainly that it isn't "smooth", as mentioned in another comment by @operatorerror. – yoniLavi Jul 31 '22 at 15:01