16

I'm reading the book Pattern Recognition and Machine Learning by Christopher Bishop, and on page 80, with regard to the multivariate gaussian distribution:

$$ \mathcal{N}(\mathbf{x} | \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{D/2}}\frac{1}{| \boldsymbol{\Sigma}|^{1/2}}~ \exp \biggl \{ -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^{\mathrm{T}}~ \boldsymbol{\Sigma}^{-1}~(\mathbf{x} - \boldsymbol{\mu}) \biggr \} $$ it says:

First of all, we note that the matrix $ \boldsymbol{\Sigma} $ can be taken to be symmetric, without loss of generality, because any antisymmetric component would disappear from the exponent.

It's not clear to me what this means. Can someone explain?

When I plot such a distribution (e.g. using octave) using a non-symmetric matrix, I still get a valid distribution out. E.g. if I use $ \boldsymbol{\Sigma}$ = [1, 0.25; 0.5, 1], I get something out that looks half-way between $ \boldsymbol{\Sigma}$ = [1, 0.25; 0.25, 1], and $ \boldsymbol{\Sigma}$ = [1, 0.5; 0.5, 1].

Does the phrase "without loss of generality" here simply imply that for any asymmetric $ \boldsymbol{\Sigma}$ there is an equivalent symmetric one which would have resulted in the exact same Mahalanobis distance, and therefore we might as well only deal with the symmetric versions for mathematical convenience?

  • 4
    Hmm, it is clear that you can replace $\Sigma^{-1}$ by $\frac12(\Sigma^{-1}+(\Sigma^{-1})^T)$ in the $\exp{\cdots}$ factor and get the same result. But it is not obvious that this can always be effected by a change in $\Sigma$ -- for example, if $\Sigma=({}^{;0}_{-1};{}^1_0)$, then $\frac12(\Sigma^{-1}+(\Sigma^{-1})^T)=0$ which is not the inverse of any possible $\Sigma$. Perhaps there are some known conditions on $\Sigma$ in the context that prevents this from happening -- for example, is it positive definite? – hmakholm left over Monica May 02 '17 at 13:03
  • @HenningMakholm yes, for the distribution to be well-defined (as per Bishop) it needs to be positive definite, or at least positive semi-definite, so this holds. So, the quoted phrase is as I understood it then? It's not that the distribution is undefined for an antisymmetric Sigma, but just that we might as well use its symmetric "equivalent"? – Tasos Papastylianou May 02 '17 at 13:13
  • @HenningMakholm By the way, how is it "clear" that I can do that replacement? Could you elaborate on that bit a bit more? The symmetry of the situation wasn't obvious to me, I had to expand everything in a 2x2 and 3x3 example to spot the pattern. Am I missing something obvious? In any case, if you'd like to convert the above comments to an answer to that effect I would be happy to accept it. – Tasos Papastylianou May 02 '17 at 13:39
  • 2
    Yes "without loss if generality" means something like "we might as well assume that such-and-such is the case because all relevant situations can be achieved by something of that shape". – hmakholm left over Monica May 02 '17 at 14:12
  • 4
    The "clear" substitution is general for the case of quadratic forms where you have something of the form $v^TAv$ where $v$ is a single column. Then $v^TAv$ is $1\times 1$ and therefore equals its transpose $(v^TAv)^T=v^TA^Tv^{TT}=v^TA^Tv$. Therefore the arithmetic mean of $v^TAv$ and $v^TA^Tv$ is again the same of each of those, and by the distributive law, $$ \tfrac12 v^TAv+ \tfrac12 v^TA^Tv =v^T(\tfrac12 A+\tfrac12 A^T) v $$ And clearly $\tfrac12 A+\tfrac12 A^T$ is symmetric. – hmakholm left over Monica May 02 '17 at 14:16
  • I'm not sure this is an actual answer to your question, though, because I'm not sure how the $|\Sigma|^{1/2}$ would be handled. – hmakholm left over Monica May 02 '17 at 14:19
  • Thank you, that makes it very clear. I think with regard to that quote, it would be fine to assume positive definite matrices for this discussion, as that would seem to fit that particular context. – Tasos Papastylianou May 02 '17 at 14:30

1 Answers1

12

Write (2.44) as $\Delta^2 = (x-\mu)^T A\; (x-\mu)$, where $A = \Sigma^{-1}$.

We know that $A = \frac{1}{2} (A + A^T) + \frac{1}{2} (A - A^T)$.

Let $B = \frac{1}{2} (A + A^T), ~C = \frac{1}{2} (A - A^T)$, then $B$ is symmetric, and $C$ is anti-symmetric, $c_{ij} = - c_{ji}$.

So $\Delta^2 = (x-\mu)^T B\; (x-\mu) + (x-\mu)^T C\; (x-\mu)$, in which: $$\begin{array}{l l l}(x-\mu)^T C\; (x-\mu) & = & \displaystyle \sum_{i=1}^D \sum_{j=1}^D c_{ij} (x-\mu)_i (x-\mu)_j \\ & = & \displaystyle \sum_{i=1}^D\sum_{j=i+1}^D (c_{ij}+c_{ji}) (x-\mu)_i (x-\mu)_j \\ & = & 0\end{array}$$

So $\Delta^2 = (x-\mu)^T B\; (x-\mu)$, where $B = \frac{1}{2} (A + A^T)$ is a symmetric matrix. That is, if $\Sigma^{-1}$ isn't symmetric, then there's another symmetric matrix $B$ so that $\Delta^2 = (x-\mu)^T \Sigma^{-1} (x-\mu)$ is equal to $\Delta^2 = (x-\mu)^T B\; (x-\mu)$.

Yuz
  • 136
  • Thank you for your answer. I'm accepting this since Henning did not care to provide a formal answer and this is essentially along the same lines (while also nicely explaining the phrase "any antisymmetric component will disappear"). – Tasos Papastylianou Jun 26 '17 at 09:32