26

We know that standard deviation (SD) represents the level of dispersion of a distribution. Thus a distribution with only one value (e.g., 1,1,1,1) has SD equals to zero. Similarly, such a distribution requires little information to be defined. On the other hand, a distribution with high SD requires many bits of information to be defined, therefore we can say its entropy level is high.

http://upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Comparison_standard_deviations.svg/612px-Comparison_standard_deviations.svg.png

So my question: is SD the same as entropy?

If not, which relationship exist between these two measurements?

  • 2
    see https://en.wikipedia.org/wiki/Entropic_uncertainty section "Entropy versus variance bounds" – lowtech Mar 27 '19 at 03:07

4 Answers4

30

They are not the same. If you have a bimodal distribution with two peaks and allow the spacing between them to vary, the standard deviation would increase as the distance between the peaks increases. However, the entropy $$H(f) = -\int f(x) \log f(x) dx$$ doesn't care about where the peaks are, so the entropy would be the same.

Nick Alger
  • 19,977
14

More counterexamples:

  1. Let X take be a discrete random variable taking two values $(-a,a)$ with equal probability. Then the variance $\sigma_X^2=a^2$ increases with $a$, but the entropy is constant $H(X)=1$ bit.

  2. Let $X$ be a discrete rv taking values on $1 \cdots N$ with some arbitrary non-uniform distribution $p(X)$. If we permute the values of $p(X)$, the variance will change (decrease if we move the larger values towards the center), but the entropy is constant.

  3. Let $X$ be a continuous rv with uniform distribution on the interval $[-1,1]$ $p(X)=1/2$. Let modify it so that its probability (on the same support) is bigger towards the extremes : say, $p(Y)=|Y|$. Then $\sigma^2_Y > \sigma_X^2$ but $H(Y)< H(X)$ (the uniform distribution maximes the entropy for a fixed compact support).

leonbloy
  • 66,202
10

Entropy and Standard Deviation are certainly not the same, but Entropy in most cases (if not all) depends on the Standard Deviation of the distribution. Two examples:

For the Exponential distribution with density function $$\lambda e^{-\lambda x},\;\; x\ge 0,\, SD=1/\lambda$$we have

$$H(X) = 1-\ln\lambda = 1+\ln SD$$

So as SD increases, so is (here differential) Entropy.

For the Normal distribution, with density function $$\frac{1}{\sigma\sqrt{2\pi}}\, e^{-\frac{(x - \mu)^2}{2 \sigma^2}}, \;\; SD = \sigma$$ we have

$$H(X) = \frac12 \ln(2 \pi e \, \sigma^2) = \frac12 \ln(2 \pi e) +\ln SD $$ so again differential Entropy increases with SD.

(Note that differential Entropy can be negative).

0

The standard deviation and the entropy are not the same, but a transformation of the standard deviation, the coefficient of variation ($CV_Y := \frac{\sigma_Y}{\mu_Y}$), is part of the single-parameter generalized entropy family of measures of inequality (or, technically, a transformation of the CV is part of the entropy family). There was a vibrant literature on this in British econometrics in the 80s; the standard papers are from Shorrocks (1980, 1982, 1983), but Cowell's textbook (various editions), Measuring Inequality, is a good source.

Generally, if $\theta$ is our parameter, we can write a formula for entropy ... (Cowell has a really good discussion of the meaning of $\theta$ that I will avoid here)

$$ \begin{align*} E_\theta = \frac{1}{\theta(\theta-1)} \frac{1}{N} \sum_{i=1}^N \left\{ \frac{y_i}{\mu_Y}\right\}^\theta - 1, \theta \notin \{0, 1\} \end{align*} $$

For $\theta=2$, this is equal to half the square of the $CV$ less one (some people call this square the relvariance, although I don't see it often; Kish's famous book on sampling methods does).

$$ \begin{align*} E_2 &= \frac{1}{2} \frac{1}{N} \sum_{i=1}^N \left\{ \frac{y_i}{\mu_Y} \right\}^2 -1 \cr &= \frac{1}{2} \frac{1}{N} \sum_{i=1}^N \left\{ \frac{y_i-\mu_Y}{\mu_Y} \right\}^2 - \frac{1}{2} \cr &= \frac{1}{2} [CV]^2 - \frac{1}{2} \end{align*} $$

To get from line two to line three, note that to complete the square, we have to add back $\frac{1}{2}\frac{1}{N} \sum_i 2\frac{Y_i \mu_Y}{\mu_Y^2} = \frac{1}{2}\frac{1}{N\mu_Y} 2N \mu_Y = 1$ and subtract off $\frac{1}{2}\frac{1}{N} \sum_i (\frac{\mu_Y}{\mu_Y})^2 = \frac{1}{2} \frac{1}{N} N = \frac{1}{2}$.[^1]

To get the more familiar entropy formula, we need to use L'Hôpital's rule for $\theta = 1$.

$$ \begin{align*} \lim_{\theta \rightarrow 1} E_1 &= \frac{\text{d} E_1}{\text{d}\theta}|_{\theta = 1} \cr &= \left\{ \frac{1}{N(2\theta-1)} \sum_{i=1}^N \left\{ \frac{y_i}{\mu_Y} \right\}^\theta \ln\frac{y_i}{\mu_Y} \right\}|_{\theta=1} \cr &= \frac{1}{N} \sum_{i=1}^N \frac{y_i}{\mu_Y} \ln\frac{y_i}{\mu_Y} \cr \end{align*} $$

Finally, if we just move the $N$ inside the summation and consider to be a coefficient on the mean, then that term becomes the share of person $i$ in the total income.

$$ \begin{align*} E_1 &= \sum_{i=1}^N \frac{y_i}{N\mu_Y} \ln\frac{y_i}{\mu_Y} \cr &= \sum_{i=1}^N \frac{y_i}{N\mu_Y} \ln\frac{y_i}{\mu_Y} \cr \end{align*} $$

If you call that the "probability of a dollar of national income belonging to $i$", then we have, finally, the standard entropy formula:

$$ \begin{align*} E_1 &= \sum_{i=1}^N \pi_i \ln \pi_i = -H(Y) \cr \end{align*} $$

So, TL;DR, the generalized entropy formula allows you to recover a simple transformation of the standard deviation as well as the "regular" entropy formula (with a slightly modified interpretation).

[1]: I have seen this fact reported as "the entropy of degree two is half the relvariance", but the algebra and simulations (see below for Stata code) make me think that it is missing the subtraction of $\frac{1}{2}$ (in the Stata code, I use the uncorrected denominator for the variance so that it agrees with the entropy formula).

sysuse auto, clear
qui sum price, d
local cv = r(sd)/r(mean)
local halfrelvar = 0.5*(`cv'^2 * (r(N)-1)/r(N))
gen meansqprice = (price/r(mean))^2
qui sum meansqprice, d
local e2 = (1/2)*(1/(r(N)-1))*(r(N)-1)*r(mean)
local ent = `e2' - 0.5
di "The entropy of degree two minus 0.5 is `ent'" 
di "Half the relvar is `halfrelvar'"
gjmb
  • 33