5

Why is $\nabla_A \mbox{tr} (ABA^TC) = CAB + C^TAB^T$?

Here $A, B, C, D$ are all $n \times n$ matrices.

$$\nabla_A f(A) = \left[\begin{matrix} \frac{\partial f}{\partial A_{11}}... \frac{\partial f}{\partial A_{1n}}\\ ...\\ \frac{\partial f}{\partial A_{n1}}... \frac{\partial f}{\partial A_{nn}}\\ \end{matrix}\right]$$

I tried to prove it in this way:

$$\begin{align} \nabla_A \mbox{tr} (ABA^TC) &= \nabla_Atr (BA^TC)A\\ &= \nabla_A \mbox{tr} DA ......let \ D=BA^TC\\ &= \nabla_A \mbox{tr} AD\\ &=D^T\\ &=B^TAC^T\end{align}$$

Since $B^TAC^T \neq CAB + C^TAB^T$, there must be something wrong in my derivation. How to prove this property?

Cesc
  • 53
  • Your mistake was to neglect that $A^T$ (hidden in your $D$) also depends on $A$. You treated it as a constant. – celtschk Jul 16 '16 at 17:15

3 Answers3

7

$$tr(ABA^TC)=\sum_{ijkl}A_{ij}B_{jk}A^T_{kl}C_{li}=\sum_{ijkl}A_{ij}B_{jk}A_{lk}C_{li}$$ $$(\nabla_{A}tr(ABA^TC))_{mn}=\frac{\partial}{\partial A_{mn}}\sum_{ijkl}A_{ij}B_{jk}A_{lk}C_{li}$$ $$=\sum_{ijkl}B_{jk}A_{lk}C_{li}\delta_{im}\delta_{jn}+\sum_{ijkl}A_{ij}B_{jk}C_{li}\delta_{lm}\delta_{kn}$$ $$=\sum_{kl}B_{nk}A_{lk}C_{lm}+\sum_{ij}A_{ij}B_{jn}C_{mi}$$ $$=\sum_{kl}C_{lm}A_{lk}B_{nk}+\sum_{ij}C_{mi}A_{ij}B_{jn}$$ $$=\sum_{kl}C^T_{ml}A_{lk}B^T_{kn}+\sum_{ij}C_{mi}A_{ij}B_{jn}$$ $$=(C^TAB^T)_{mn}+(CAB)_{mn}$$

$$\nabla_{A}tr(ABA^TC)=C^TAB^T+CAB$$

velut luna
  • 10,162
4

Given $\mathrm A, \mathrm B, \mathrm C \in \mathbb R^{n \times n}$, define $f : \mathbb R^{n \times n} \to \mathbb R$ by

$$f (\mathrm X) := \mbox{tr} (\mathrm A \mathrm X \mathrm B \mathrm X^T \mathrm C)$$

The directional derivative of $f$ in the direction of $\mathrm V$ at $\mathrm X$ is

$$\begin{array}{rl} D_{\mathrm V} f (\mathrm X) &= \displaystyle\lim_{h \to 0} \frac{1}{h} \left( f (\mathrm X + h \mathrm V) - f (\mathrm X) \right) \\\\ &= \mbox{tr} (\mathrm A \mathrm V \mathrm B \mathrm X^T \mathrm C) + \mbox{tr} (\mathrm A \mathrm X \mathrm B \mathrm V^T \mathrm C)\\\\ &= \mbox{tr} ((\mathrm A^T \mathrm C^T \mathrm X \mathrm B^T)^T \mathrm V) + \mbox{tr} (\mathrm V^T \mathrm C \mathrm A \mathrm X \mathrm B )\\\\ &= \langle \mathrm A^T \mathrm C^T \mathrm X \mathrm B^T , \mathrm V \rangle + \langle \mathrm V, \mathrm C \mathrm A \mathrm X \mathrm B \rangle\\\\ &= \langle \mathrm A^T \mathrm C^T \mathrm X \mathrm B^T + \mathrm C \mathrm A \mathrm X \mathrm B, \mathrm V \rangle\end{array}$$

Hence,

$$\nabla_{\mathrm X} f (\mathrm X) = \mathrm A^T \mathrm C^T \mathrm X \mathrm B^T + \mathrm C \mathrm A \mathrm X \mathrm B$$

If $\mathrm A = \mathrm I_n$, then

$$\nabla_{\mathrm X} f (\mathrm X) = \color{blue}{\mathrm C^T \mathrm X \mathrm B^T + \mathrm C \mathrm X \mathrm B}$$


3

If you write the function in terms of the Frobenius Inner Product, then finding the differential and gradient is almost trivial $$\eqalign{ f &= C^TAB^T:A \cr\cr df &= C^T\,dA\,B^T:A + C^TAB^T:dA \cr &= (CAB + C^TAB^T):dA \cr\cr \frac{\partial f}{\partial A} &= CAB + C^TAB^T \cr\cr }$$ Frobenius products can be rearranged in a variety of ways $$\eqalign{ A:BC &= AC^T:B \cr &= B^TA:C \cr &= A^T:(BC)^T \cr &= BC:A \cr &= {\rm tr}(A^TBC) \cr }$$ all of which can proved directly, or by using the trace-equivalence and the cyclic property of the trace.

hans
  • 1,804
  • I don't understand how is $df = (C^T)dA (B^T) : A + (C^T) A B^T : dA$ – aroma Dec 21 '17 at 17:57
  • 1
    @aroma The differential of a product is $$d(X\star Y) = dX\star Y+X\star dY$$ This simple rule holds for almost any type of product (e.g. matrix, hadamard, kronecker, frobenius, dyadic) that you are likely to encounter. In the current problem, take $X=C^TAB^T$ and $Y=A$ and $(\star) = (:) $ – hans Dec 25 '17 at 03:56