17

Show that the entropy of the multivariate Gaussian $N(x|\mu,\Sigma)$ is given by \begin{align} H[x] = \frac12\ln|\Sigma| + \frac{D}{2}(1 + \ln(2\pi)) \end{align} where $D$ is the dimensionality of $x$.

My solution.

Entropy for normal distribution:

\begin{align} H[x] = -\int_{-\infty}^{+\infty}N(x|\mu,\Sigma)\ln(N(x|\mu,\Sigma)) dx = &&\text{by definition of entropy}\\ = -E[\ln(N(x|\mu,\Sigma))] =\\ = -E[\ln((2\pi)^{-\frac{D}{2}} |\Sigma|^{-\frac12} e^{-\frac12(x - \mu)^T\Sigma^{-1}(x - \mu)})] = &&\text{definition of multivariable gaussian}\\ = \frac{D}{2}\ln(2\pi) + \frac12\ln |\Sigma| + \frac12E[(x - \mu)^T\Sigma^{-1}(x - \mu)] &&\text{the log of a product is the sum of the logs}. \end{align}

Consider the third term:

\begin{align} \frac12E[(x - \mu)^T\Sigma^{-1}(x - \mu)] = \\ = \frac12E[x^T\Sigma^{-1}x - x^T\Sigma^{-1}\mu - \mu^T\Sigma^{-1}x + \mu^T\Sigma^{-1}\mu] = \\ = \frac12E[x^T\Sigma^{-1}x] - \frac12E[2\mu^T\Sigma^{-1}x] + \frac12E[\mu^T\Sigma^{-1}\mu] = \\ = \frac12E[x^T\Sigma^{-1}x] - \mu^T\Sigma^{-1}E[x] + \frac12\mu^T\Sigma^{-1}\mu = \\ = \frac12E[x^T\Sigma^{-1}x] - \mu^T\Sigma^{-1}\mu + \frac12\mu^T\Sigma^{-1}\mu = &&\text{Since $E[x] = \mu$}\\ = \frac12E[x^T\Sigma^{-1}x] - \frac12\mu^T\Sigma^{-1}\mu \end{align}

How can I simplify the term: $E[x^T\Sigma^{-1}x]$ ?

Andreo
  • 1,213
  • 1
  • When working with Gaussians, usually it makes for easier integrals if one leaves terms like $(x- \mu)$ alone instead of pulling $\mu$ out.
  • Assuming you follow $1$, try the substitution $z = \Sigma^{-1/2} (x - \mu)$ in the integral $\int_{\mathbb{R}^D} (x - \mu)^T \Sigma^{-1} (x-\mu) \exp \left( (x - \mu)^T \Sigma^{-1} (x-\mu) \right) ,\mathrm{d} x$ (note that $\Sigma$ is positive semi-definite, which makes life kinda easy here).
  • – stochasticboy321 Nov 25 '16 at 05:26
  • 1
    Thanks! I got it:

    $\Sigma^{-1} = \sum_{i=1}^D \frac{1}{\lambda_i} e_i e_i^T$ Then: $(x - \mu)^T\Sigma^{-1}(x - \mu) = \sum_{i=1}^D \frac{1}{\lambda_i} (x - \mu)^T e_i e_i^T (x - \mu) = \sum_{i=1}^D \frac{y_i^2}{\lambda_i}$

    Where: $y_i = e_i^T (x - \mu)$ - scalar. Then we can switch to $dy_i$ coordinates. And after simplification will get just $D$.

    – Andreo Nov 26 '16 at 07:39
  • @Andreo What are your $\lambda_i$? It seems like you're claiming that $\Sigma^{-1}$ is diagonal. – Eric Auld Feb 14 '18 at 03:51
  • @EricAuld $\lambda_i$ is an eigenvalue. $\Sigma^{-1}$ can be factorized as $U \Lambda U^T$, where $\Lambda$ is a diagonal of eigen-values and $U$ consists of eigen-vectors. – Andreo Feb 15 '18 at 21:42
  • @Andreo the way to decompose $\Sigma^{-1}$ as the eigenvalue weighted sum of orthonormal vector multiplications can explain how we can extract the $|\Sigma^{-1/2}|$ as the determinant of the Jacobian matrix when computing the normalization factor by substitutition for multiple variables – Kuo Oct 16 '22 at 11:07
  • "where $D$ is the dimensionality of $x$" -- what does "dimensionality" mean? Could you please link to a definition? – Tomas Apr 08 '25 at 08:49