Let $X$ be a random vector with covariance matrix $\Sigma$.
People often describe $\Sigma$ in terms of its components: $\Sigma_{ij}$ is the covariance of the $i$th and $j$th components of $X$.
But in linear algebra, thinking about a matrix in terms of its components is often discouraged. It is often more enlightening to avoid thinking in terms of components.
So what is the "best" way to think about $\Sigma$, particularly for someone who likes linear algebra?
I know that $\Sigma = \mathbb E((X - \mu)(X - \mu)^T)$, where $\mu = \mathbb E(X)$. But I think I am still missing something, because I'm not sure what to make of that formula. Does this formula shed light on what $\Sigma$ really is and why we care about it?