9

On calculating log-likelihood function for some multivariate distributions, such as multivariate Normal, I see some examples where the matrices are suddenly changed to trace, even when the matrix is not diagonal. I searched online to find a plausible explanation for this "trace trick" without success. What is it all about?

Can someone clarify the usage of trace in this situation?

Bellow a slide with an example where you can find this usage of trace.

enter image description here

A.T
  • 426

1 Answers1

20

The trace is invariant under cyclic permutations. This means $\text{Tr}(\mathbf{ABC}) = \text{Tr}(\mathbf{CAB}) = \text{Tr}(\mathbf{BCA})$. The terms of form $(\mathbf{x_n-\mu})^T\Sigma^{-1}(\mathbf{x_n-\mu})$ are scalars (or, if you like, $1\times1$ matrices). The trace of a scalar is just the scalar. Note also that the trace is also linear, so $\text{Tr}(\alpha\mathbf{A}+\beta\mathbf{B}) = \alpha\text{Tr}(\mathbf{A}) + \beta\text{Tr}(\mathbf{B})$, which they use right underneath where you circled. This trick is used a lot, especially when one encounters quadratic forms (i.e. $\mathbf{x}^T\mathbf{Qx}$, where $\mathbf{Q}$ is symmetric).

All they do is replace "$\text{scalar}$" with "$\text{Tr}(\text{scalar})$", and then apply the cyclic permutation property. $(\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu}) = \text{Tr}((\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu}))$ because $(\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu})$ is a just a scalar. $\text{Tr}((\mathbf{x_n -\mu})^T\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu})) = \text{Tr}(\mathbf{\Sigma}^{-1}(\mathbf{x_n-\mu})(\mathbf{x_n -\mu})^T)$ by the permutation property I mentioned.

  • Still I don't understand why we can simple change a expression that involves matrices into the trace of this expression. When can this trace trick be applied? Are there special conditions? What allows me to use the trace in this specific situation? – A.T Sep 26 '18 at 23:09
  • See the new paragraph I added. All they do is replace a scalar (written as a quadratic form $\mathbf{x}^T\mathbf{Qx}$) by the trace of a scalar. They then apply the cyclic permutation property to the argument of the trace operator. – Travis C Cuvelier Sep 27 '18 at 19:01
  • Now I got it! as the result of xTQx is a scalar, and the trace of a scalar is the scalar itself, one can use the trace trick to simplify the calculation. All makes sense now. Thank you. – A.T Sep 28 '18 at 00:39