2

I know that the gradient of $X \mapsto \mbox{Tr}(XA)$ is $A^T$. However, how does this change if we had a scenario where $A$ and $X$ are swapped. Is the gradient $X \mapsto \mbox{Tr}(AX)$ the same?

Also, how does this extend if we have more matrices? We can just assume everything before our "$X$" is $A$, correct? For example, $X \mapsto\mbox{Tr}\left(U^T V X\right)$. We can assume this is similar to the above where $U^TV$ is our "$A$" matrix, right?

Jibril
  • 151

1 Answers1

1

Theorem: ${\mathrm{d} f({X})= \text{trace}(M^T \mathrm{d} {X}) \iff \frac{\partial f}{\partial {X}} = M}$


In your case,

$$\mathrm d \ \text{trace}(AXB) = \text{trace}(\mathrm d (AX B)) = \text{trace}(A \ \mathrm d X\ B) = \text{trace}(B A \ \mathrm d X)$$ and thus we identify $(BA)^T = A^T B^T$ as the derivative.

user365239
  • 2,056
  • Hey! Thanks! That makes sense. I'm trying to work through some stuff I've read using the matrix cookbook but its rough keeping some stuff straight sometimes.

    If you don't mind a follow-up then, so what if we have

    trace(X^T A^T A X) and we differentiate w.r.t. X here then?

    My intuition is that we get (X^T A^T A)^T = (X A A^T), but I could be wrong.

    – Jibril Feb 19 '18 at 03:22
  • Look at formula (108) in cookbook. maybe try to derive it like i did above – user365239 Feb 19 '18 at 03:24
  • @Jibril You could also use Gateaux variation to calculate these matrix derivatives. – Jacky Chong Feb 19 '18 at 03:48
  • JackyChong Thanks', I'd have to look into that. Not familiar. @user365239 - What do you do when you can't find a form in the cookbook, by chance? For example if we reversed the above and wanted it with respect to A, then (108) doesn't work. trace( X^T A^T A X ) doesn't have an equivalent in the cookbook as far as I can make out. – Jibril Feb 19 '18 at 04:02
  • Set $B = A^T A$ – user365239 Feb 19 '18 at 04:04
  • But I want to differentiate w.r.t A? – Jibril Feb 19 '18 at 04:16
  • (116) . . . but you should try it by hand . . . – user365239 Feb 19 '18 at 04:28
  • Thanks. I think I will do just that, since I'm not sure where that C factors in here. My fear with doing it by hand is that, since I am so unfamiliar in this territory, that I will lead myself down the wrong path. I appreciate your answers. – Jibril Feb 19 '18 at 04:32
  • It's OK, set $C=I$ then it's the same as yours – user365239 Feb 19 '18 at 04:32