1

A typical statistics course will define the standard deviation as

"the average of the difference between the data set and the mean".

So if we tried to describe the definition mathematically we should derive this equation (∑(|x-mean|))/n

While the known law for standard deviation is sqrt((∑(x-mean))^2⁄n) which doesn't make sense to me why we take the square root of all the variance function[(∑(x-mean))^2⁄n] instead of the square root of only the squared part.

For example, if I have a list of 2 numbers: [14, 6]

The arithmetic mean = 10

The average of distance of each value from the mean should = 4 , (14-4 = 10, 6+4 = 10)

While the standard deviation law will calculate the value 4 x sqrt(2) ( 14 - 4sqrt(2) != 10)

So, regarding the results shown can someone define a solid definition for the standard deviation and the intuition behind it?

Milan
  • 117
  • 5
    I would hope that a typical statistics course would not define it that way. – Robert Israel Apr 12 '21 at 20:29
  • Square is much nicer function as the absolute value. So not surprising at all it appears to be the correct choice. – user Apr 12 '21 at 20:48
  • I understand Square is much richer function than the absolute value, my question is why we divide by sqrt(n) like in 'SD' and not n like in 'MD' – Peter Farhat Apr 12 '21 at 20:59
  • If that is your only complaint, then use Variance instead of Standard Deviation. $\text{Var}(X) = \dfrac{\sum\limits_{k=1}^N(x_k-\mu)^2}{N}$. The standard deviation happens to be the square root of the variance. "Why use standard deviation instead of variance then in so many formulas?" For the same reason why we use $\pi$ instead of $\tau$ in so many formulas... historical reasons, simplification of arithmetic and various expressions, personal preference... They both convey the same information. – JMoravitz Apr 12 '21 at 21:02
  • sir, I am not complaining I am asking the intuition behind the SD – Peter Farhat Apr 12 '21 at 21:04
  • We gave it. It is useful, it has many nice properties, more nice properties than the mean absolute deviation, it appears in many natural problems which well describe what is seen in nature (e.g. normal distribution), and so on... – JMoravitz Apr 12 '21 at 21:05
  • @JMoravitz : No, it's not historical reasons. If $X_1,\ldots,X_k$ are independent random variables, then $$ \operatorname{var}(X_1+\cdots+X_k) = \operatorname{var}(X_1)+\cdots+\operatorname{var}(X_k). $$ Nothing like that works with the mean absolute deviation. – Michael Hardy Apr 12 '21 at 21:05
  • @MichaelHardy read again. That line was in reference to using Variance versus Standard Deviation. The user was complaining about the division by $N$ being inside of the square root for standard deviation rather than outside of the square root. My point was that it is not inside of a square root when talking about variance, so the complaint was invalid and the point was moot. – JMoravitz Apr 12 '21 at 21:06
  • @PeterFarhat : $$ \operatorname{var}(X_1+\cdots+X_k) = \operatorname{var}(X_1)+\cdots+\operatorname{var}(X_k). \qquad\longleftarrow \text{This is the “intuition.”} $$ – Michael Hardy Apr 12 '21 at 21:06
  • @JMoravitz : The O.P. edited. Look at the question the way it was originally written. – Michael Hardy Apr 12 '21 at 21:08
  • @MichaelHardy My response talking about variance versus standard deviation was not in reply to something directly asked by OP but was rather a preemptive reply to a possible complaint that my earlier reply was irrelevant... that my pointing out the usage of Variance rather than Standard Deviation doesn't answer the question about the formula for standard deviations – JMoravitz Apr 12 '21 at 21:10
  • Currently this question says the definition is sqrt((∑(x-mean))^2⁄n). But it should say $$ \sqrt{\frac{\sum_x(x-\text{mean})^2,,} n} $$ The $n$ is inside the radical. $\qquad$ – Michael Hardy Apr 12 '21 at 21:10
  • You and I are in complete agreement about the usefulness of variance over the mean absolute deviation. You are incorrectly reading my comments thinking I am comparing variance to mean absolute deviation. I am not. I am comparing variance to standard deviation in the quoted passage. This whole back and forth is not helpful and should be deleted – JMoravitz Apr 12 '21 at 21:12
  • @MichaelHardy And yes, it is for historical reasons that we use $\text{Var}(X)$ instead of $\sigma^2$ in some formulas and $\sigma$ instead of $\sqrt{\text{Var}(X)}$ in others... just as we use $\pi=3.14159\dots=\dfrac{C}{d}$ as the circle constant in some formulas instead of $\frac{1}{2}\tau=\dfrac{1}{2}\cdot \dfrac{C}{r}$... Both give the same information, just presented differently. We could just as easily have $f(x)=\dfrac{1}{\sqrt{\text{Var}(X)2\pi}e^{-\frac{(x-\mu)^2}{2\text{Var}(X)}}$ – JMoravitz Apr 12 '21 at 21:25
  • @JMoravitz : On another matter: When you type 2\text{Var}(X) then you will see $2\text{Var}(X)$ instead of $2\operatorname{Var}(X),$ which is coded as 2\operatorname{Var}(X). The point is not just that some horizontal space is added, but rather that the spacing varies with the context, so that you see more space to the right of $\operatorname{Var}$ in $2\operatorname{Var}X$ than in $2\operatorname{Var}(X). \qquad$ – Michael Hardy Apr 12 '21 at 22:00
  • @MichaelHardy I appreciate your help sir, but I still can't understand what the addition property has to do with the meaning of SD – Peter Farhat Apr 12 '21 at 22:03
  • @PeterFarhat : It came about like this: In the first half of the 18th century, Abraham de Moivre considered this problem: If you toss a coin $1800$ times, what is the probability that the number of "heads" is between specified numbers? He found that he could approximate that very closely by an area under what some now call the "bell-shaped curve" $y= C e^{-x^2/2},$ (where $C$ is a constant that he could compute numerically, and that was somewhat later found to be $1/\sqrt{2\pi,,}.$ But what interval in the range from $0$ to $1800$ should correspond to$,\ldots\qquad$ – Michael Hardy Apr 12 '21 at 22:11
  • But what interval in the range from $0$ to $1800$ should correspond to what interval of $x\text{-}$values? That problem is what he solved by computing the variance resulting from a single coin toss (where the number of "heads" is either $0$ or $1$) and multiplying it by $1800.$ That cannot be done without the additive property of variances. And that is done all the time in similar problems today. $\qquad$ – Michael Hardy Apr 12 '21 at 22:13

1 Answers1

2

A typical statistics course will define the standard deviation as "the average of the difference between the data set and the mean ".

That is false. I doubt that you've seen that in any textbook on probability or statistics.

So if we tried to describe the definition mathematically we should derive this equation (∑(|x-mean|))/n

That is NOT the standard deviation. That is the mean absolute deviation. It is quite intuitive, but it lacks a useful property: the variance (i.e. the square of the standard deviation) of the sum of independent random variables is the sum of their variances.

Both the standard deviation and the mean absolute deviation are measures of dispersion in that: (1) they don't change if one number is added to all of the numbers in the list and (2) if you multiply all of the numbers in the list by one number, then you multiply the measure of dispersion by the absolute value of that number.

while, the known law for standard deviation is (∑(|x-mean|))/sqrt(n)

No, it is not. Where did you find that? The square root of $n$ in the denominator shows up when you talk about the standard deviation of a sample mean, but there's nothing like that in a definition of standard deviation.

Example if I have a list of 2 numbers [14,6]

the arithmetic mean = 10

the average of distance of each value from the mean should = 4 , (14-4 = 10, 6+4 = 10)

Correct.

while the standard deviation law will calculate

It will give you $4.$

So, regarding the results shown can someone define a solid definition for the standard deviation and the intuition behind it?

It is the square root of the average of the square of the difference between the realized values and their average.

The reason it is done that way is that that makes the standard deviation a quantity that satisfies the points numbered (1) and (2) above while also having the "useful property" referred to above.

  • 1
    thanks for the information above, but it still don't clearly describe the intuition of SD at least for me in other words I can't really persuade my mind why not use the mean deviation. – Peter Farhat Apr 12 '21 at 20:56
  • 1
    @PeterFarhat Because mean deviation does not have the "useful property" that Michael Hardy talks about in his post, $\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y)$ for independent variables $X$ and $Y$. – JMoravitz Apr 12 '21 at 20:57
  • I am not saying mean deviation is useful, what I am saying what does SD actually measure, I don't want perfect answer I just want to understand the meaning of it, but seems until now for me its just better for calculations but lack of physical meaning. – Peter Farhat Apr 12 '21 at 22:06
  • @PeterFarhat :What it measures is dispersion. As does the mean absolute deviation. Anything that satisfies (1) and (2) in the stated answer measures a kind of dispersion. – Michael Hardy Apr 12 '21 at 22:14