28

When I first got into information theory, information was measured or based on shannon entropy or in other words, most books I read before were talked about shannon entropy. Today someone told me there is another information called fisher information. I got confused a lot. I tried to google them. Here are links, fisher information: https://en.wikipedia.org/wiki/Fisher_information and shannon entropy goes here https://en.wikipedia.org/wiki/Entropy_(information_theory).

What are differences and relationship between shannon entropy and fisher information? Why do two kinds of information exist there?

Currently, my idea is that it seems fisher information is a statistical view while shannon entropy goes probability view.

Any comments or anwsers are welcome. Thanks.

2 Answers2

15

Fisher information is related to the asymptotic variability of a maximum likelihood estimator. The idea being that higher Fisher Information is associated with lower estimation error.

Shannon Information is totally different, and refers to the content of the message or distribution, not its variability. Higher entropy distributions are assumed to convey more information because must be transmitted with more bits.

However, there is a relationship between Fisher Information and Relative Entropy/KL Divergence, as discussed on Wikipedia.

  • What is the definition of asymptotic variability? – Bear and bunny Nov 11 '15 at 13:48
  • 3
    @Bearandbunny basically, the fisher information can be interpreted as the inverse of the standard error (squared), but only when the log-likelihood is quadratic (i.e., the Gaussian log-likelihood). In the vast majority of cases, we are not dealing with a gaussian population..but, the log-likelihood of the MLE will often rapidly converge to a quadratic, especially around $\pm2$ standard deviations. In this case, treating the inverse of the fisher information as an estimated precision of an estimate will be approximately correct. –  Nov 11 '15 at 14:05
  • 2
    @Bearandbunny here is a paper that outlines the theory: http://sites.stat.psu.edu/~sesa/stat504/Lecture/lec3_4up.pdf –  Nov 11 '15 at 14:07
  • I see. will upvote. – Bear and bunny Nov 11 '15 at 14:17
  • 3
    It is not true that "higher entropy distributions ... can be transmitted in fewer bits". For instance Bernuolli distribution with $p=1/2$ has entropy of 1 bit while Bernuolli with $p=1/10$ of about 1/3 bit and this is exactly because it conveys less information. – sztal Apr 23 '19 at 15:44
0

The Fisher information is related to differential entropy through a normal perturbation, and in that case, entropy can be considered as an integral of Fisher information.


The following theorem (de Bruijn’s identity) is from Section 17.7 of Elements of Information Theory.

Let $X$ be any random variable with a finite variance with a density f (x). Let $Z \sim N(0,1)$ be an independent normally distributed random variable with zero mean and unit variance. Then $$ \frac{\partial}{\partial t}h_e(X + \sqrt{t}Z) =\frac{1}{2}J(X +\sqrt{t}Z)$$ where $h_e$ is the differential entropy with base e, and $J(Y)= \int_{-\infty}^{+\infty}f(y) (\frac{\frac{\partial}{\partial y}f(y)}{f(y)})^2dy$ denotes the Fisher information at $\theta=0$ for the location family of densities $f_{\theta}(y)=f(y-\theta)$.
In particular, if the limit exists as $t \to 0$, we have $$ \frac{\partial}{\partial t}h_e(X + \sqrt{t}Z)|_{t=0} =\frac{1}{2}J(X)$$


-The relationship is also discussed in this paper: Entropy and the Central Limit Theorem

-The following answer may also be relevant: Fisher information and the "surface area of the typical set"