Gradient of $X \mapsto \mbox{Tr}(AX)$

Question

I know that the gradient of $X \mapsto \mbox{Tr}(XA)$ is $A^T$. However, how does this change if we had a scenario where $A$ and $X$ are swapped. Is the gradient $X \mapsto \mbox{Tr}(AX)$ the same?

Also, how does this extend if we have more matrices? We can just assume everything before our "$X$" is $A$, correct? For example, $X \mapsto\mbox{Tr}\left(U^T V X\right)$. We can assume this is similar to the above where $U^TV$ is our "$A$" matrix, right?

Use the invariance of the trace map with respect to conjugation to show that $Tr(XA)=Tr(AX)$ and conclude. — Clément Guérin, Feb 19 '18 at 03:12
@ClémentGuérin Thanks. What about the scenario where you have something like tr( X^T A^T A X) then? How would one work to derive that? — Jibril, Feb 19 '18 at 03:17

user365239 · Answer 1 · 2018-02-19T03:27:12.180

1

Theorem: ${\mathrm{d} f({X})= \text{trace}(M^T \mathrm{d} {X}) \iff \frac{\partial f}{\partial {X}} = M}$

In your case,

$$\mathrm d \ \text{trace}(AXB) = \text{trace}(\mathrm d (AX B)) = \text{trace}(A \ \mathrm d X\ B) = \text{trace}(B A \ \mathrm d X)$$ and thus we identify $(BA)^T = A^T B^T$ as the derivative.

edited Feb 19 '18 at 03:27

answered Feb 19 '18 at 03:19

user365239

2,056

Hey! Thanks! That makes sense. I'm trying to work through some stuff I've read using the matrix cookbook but its rough keeping some stuff straight sometimes.
If you don't mind a follow-up then, so what if we have

trace(X^T A^T A X) and we differentiate w.r.t. X here then?

My intuition is that we get (X^T A^T A)^T = (X A A^T), but I could be wrong.
– Jibril Feb 19 '18 at 03:22
Look at formula (108) in cookbook. maybe try to derive it like i did above – user365239 Feb 19 '18 at 03:24
@Jibril You could also use Gateaux variation to calculate these matrix derivatives. – Jacky Chong Feb 19 '18 at 03:48
JackyChong Thanks', I'd have to look into that. Not familiar. @user365239 - What do you do when you can't find a form in the cookbook, by chance? For example if we reversed the above and wanted it with respect to A, then (108) doesn't work. trace( X^T A^T A X ) doesn't have an equivalent in the cookbook as far as I can make out. – Jibril Feb 19 '18 at 04:02
Set $B = A^T A$ – user365239 Feb 19 '18 at 04:04
But I want to differentiate w.r.t A? – Jibril Feb 19 '18 at 04:16
(116) . . . but you should try it by hand . . . – user365239 Feb 19 '18 at 04:28
Thanks. I think I will do just that, since I'm not sure where that C factors in here. My fear with doing it by hand is that, since I am so unfamiliar in this territory, that I will lead myself down the wrong path. I appreciate your answers. – Jibril Feb 19 '18 at 04:32
It's OK, set $C=I$ then it's the same as yours – user365239 Feb 19 '18 at 04:32

Gradient of $X \mapsto \mbox{Tr}(AX)$

1 Answers1

Linked