Intuitively, why does a covariance matrix have to be positive semidefinite?

Question

If I have a covariance matrix of some random vector $X$ with expectation $\mathbb{E} (X) =: \mu \in \mathbb{R}^n$, it is not that difficult to show that its covariance matrix is positive semidefinite; given

$$ \operatorname{Cov} (X) = \mathbb{E} \left[ (X-\mu) (X-\mu)^T \right] $$

For any vector $z\in \mathbb{R}^n$, we have

$$ z^T \operatorname{Cov} (X) z = z^T \mathbb{E} \left[ (X-\mu) (X-\mu)^T \right] z = \mathbb{E} \left[ z^T (X-\mu) (X-\mu)^T z \right] $$

which is just the inner product squared

$$ \mathbb{E} \left[ z^T(X-\mu) (X-\mu)^T z \right] = \mathbb{E}\left[ \langle z, X - \mu \rangle^2 \right] \geq 0 $$

and hence always greater than or equal $0$. I don't really understand however why a non-PSD cannot function as a covariance matrix on an intuitive level. Suppose you have a non-PSD matrix. Can you prove by contradiction that it can not be the covariance matrix of some random vector $X$?

I think that is related to the fact that diagonal elements of a PSD are non negative. In this way you have non negative variances. This could be useful too https://math.stackexchange.com/questions/927902/does-a-positive-semidefinite-matrix-always-have-a-non-negative-trace#:~:text=So%2C%20if%20A%20is%20positive,the%20trace%20is%20non%2Dnegative.&text=Yes.,has%20to%20be%20non%2Dnegative. — Enrico, Aug 23 '23 at 11:26
Any covariance matrix is of the form you consider and -as you showed- positive semi definite. In other words: there is no covariance matrix that is not PSD. Not on an intuitive level and not on a rigorous level. — Kurt G., Aug 23 '23 at 13:34

score 3 · Accepted Answer · answered Aug 24 '23 at 13:20

3

Basically, I use the same argument as you from a slightly different perspective. Suppose that $\Sigma=\operatorname E[(X-\operatorname EX)(X-\operatorname EX)']$ is a covariance matrix of a random vector $X$ such that $\operatorname E\|X\|^2<\infty$ (this is just to ensure that the covariance matrix is well-defined). Now suppose that $\Sigma$ is not positive semi-definite. This means that there exists some vector $a$ such that $$ a'\Sigma a<0. $$ It follows that \begin{align*} a'\Sigma a &=a'\operatorname E[(X-\operatorname EX)(X-\operatorname EX)']a\\ &=\operatorname E[a'(X-\operatorname EX)(X-\operatorname EX)'a]\\ &=\operatorname{Var}(a'X)\\ &<0. \end{align*} Observe that $$ a'X = a_1X_1+\ldots+a_dX_d. $$ This means that there exists some linear combination of the entries of $X$ such that the variance of this linear combination is negative which of course does not make sense.

I hope this is useful.

answered Aug 24 '23 at 13:20

Cm7F7Bb

17,879
5
43
69

Great answer, I'm still trying to wrap my head around why the linear combination having negative variance relates back to the multivariate RV as a whole intuitively, or what we might see when sampling the full RV that doesn't make sense. Would you happen to have a simple example or explanation that illustrates this, or is this just a case of trust the math? – Vityou Sep 03 '23 at 21:40
@Vityou Thanks! A random vector with a covariance matrix which is not positive semi-definite does not exist. Since the quadratic form $a’\Sigma a=\operatorname{Var}(a’X)$, we see that the positive semi-definiteness of a covariance matrix is related to the non-negativeness of the variance of some random variable. We cannot think about a random variable with a negative variance. This connection is not so transparent if we just see the definition of a covariance matrix but the way the covariance matrix is defined leads to the fact that the quadratic form $a’\Sigma a$ needs to be non-negative. – Cm7F7Bb Sep 04 '23 at 07:26

score 2 · Answer 2 · answered Aug 24 '23 at 12:46

On a physics intuitive hand-waving level.

A covariance matrix is the same as, in physics, the angular inertia matrix: it is a second central moment
(cf. https://en.wikipedia.org/wiki/Moment_(mathematics)).

So if $\omega$ is the angular velocity (a vector) of a rigid system and $I_C$ its inertia matrix
(cf. https://en.wikipedia.org/wiki/Moment_of_inertia#Motion_in_space_of_a_rigid_body,_and_the_inertia_matrix),
the kinetic angular energy is $\frac 1 2 \omega I_C \omega$.
The eigenvectors are orthogonal, and are the basis of the inertia ellipsoid.

When the system is rotating, it necessarily (form a physical viewpoint) has more energy than when it is not: we have to provide energy to the system to set it in motion. So $\omega I_C \omega$ is necessarily $\ge 0$.
Moreover, the energy we provide is always $>0$, except if there is no mass around the rotation axis, i.e. the whole object is on the rotation axis.

The only drawback to this explanation, is that masses in physics are always $>0$, so we are not in the general mathematical case. But I still feel this analogy as interesting, especially with regards to distribution shape, principal component analysis, etc. If someone could provide a more precise answer that takes into account negative masses, I would be grateful.

Intuitively, why does a covariance matrix have to be positive semidefinite?

2 Answers2