1

With n-order Matrix A,B,C.I was trying to find $ \nabla_A trace(ABA^TC)$

This answer:Proof for the funky trace derivative : $d (\operatorname{trace} (ABA'C))$?

suggested: $$ \nabla_A \operatorname{trace}( ABA^{T}C ) = CAB + C^T AB^T $$

with a implication that $$\nabla_A AB = B^T$$

can somebody show me why?

I also have my own proof based on the clue(using the chain rule) from that link.

First, let $$ H(X,Y) = trace(XY^TC) \qquad\qquad (1) \\ f(A) = AB \qquad\qquad (2) \\ g(A) = trace(ABA^TC) \qquad\qquad(3) $$ $g(A)$ can be rewritten as: $$ g(A) = H(f(A),A)$$ we know the chain rule: $$ \nabla_A g(A) = \nabla_XH(X,Y)\cdot \nabla_Af(A)+\nabla_YH(X,Y)\cdot \nabla_AA $$ to simplify this equation, we need: $$ \nabla_A trace(AB) = B^T \qquad\qquad (4)\\ trace(AB) = trace(BA) \qquad\qquad (5)\\ \nabla_{A^T}f(A) = [\nabla_Af(A)]^T \qquad\qquad(6) $$ with (4), the first term $$ \nabla_XH(X,Y)\cdot \nabla_Af(A) $$ can be write as: $$ \nabla_X trace(XY^TC) \cdot \nabla_A f(A) = C^TY \cdot \nabla_A AB = C^TA \cdot \nabla_A AB $$ and with(5) the second term can be write as: $$ \nabla_YH(X,Y)\cdot = \nabla_Y trace(XY^TC) \cdot \\ =\nabla_Y trace(Y^TCX) $$ with (6): $$ \nabla_Y trace(Y^TCX) = [\nabla_{Y^T} trace(Y^TCX)]^T $$ with (4): $$ [\nabla_{Y^T} trace(Y^TCX)]^T = CX = CAB $$ now I get $$ \nabla_A ( ABA^{T}C ) = C^T A \cdot \nabla_A AB + CAB $$

but i'm not sure that $\nabla_A AB = B^T$, can somebody show me why? or give my another proof?

Thank you for your honest suggestions!

Long
  • 279

1 Answers1

3

The problem is much easier if you use the Frobenius Inner Product instead of the trace.

Write the objective function and find its differential $$\eqalign{ f &= {\rm tr}(ABA^TC) \cr &= I:ABA^TC \cr\cr df &= I:(dA)BA^TC + I:AB(dA^T)C \cr &= C^TAB^T:dA + B^TA^TC^T:dA^T \cr &= C^TAB^T:dA + CAB:dA \cr &= (C^TAB^T + CAB):dA \cr }$$where some of the expressions were rearranged using these mixed product rules $$\eqalign{ {\rm tr}(A^TBC) &= A:BC \cr &= AC^T:B \cr &= B^TA:C \cr &= A^T:(BC)^T \cr }$$which are derived from the cyclic property of the trace function.

Anyway, since $df=\big(\frac{\partial f}{\partial A}:dA\big),\,$ the gradient of the function must be $$\eqalign{ \frac{\partial f}{\partial A} &= C^TAB^T + CAB \cr }$$

frank
  • 366
  • Thank you frank. I failed to understand this $$ df = I:(dA)BA^TC + I:AB(dA^T)C $$ could you show me some derivative rules of Frobenius Inner Product? thank you! – Long Jun 02 '16 at 03:27
  • More precisely, I can't understand why we keeps $ BA^TC $ fixed and differential $A$ in the first term and why we keeps $AB$ and $C$ fixed and differential $A^T$ in the second term. Thank you! – Long Jun 02 '16 at 03:37
  • 1
    @Long The product rule $$d(XY)=dXY+X*dY$$ is valid for just about any type of matrix product. In particular, the rule is true for the Kronecker, Hadamard, Frobenius, and standard matrix product. – frank Jun 02 '16 at 04:03
  • Two points confused me. 1, $A$ and $A^T$ consists of same elements. was it OK to regard it as $X$ and $Y$ in $$d(XY) = dXY + XDY$$ to become $$d(AA^T) = dAA^T + AdA^T$$ 2. in the term $ABd(A^T)C$ $A^T$ is not in the left most position or right most position, how can I use $$d(XY) = dXY + X*DY$$ Thank you! – Long Jun 02 '16 at 09:09
  • To expand a product of several terms, successively apply the binary rule, e.g. $$\eqalign{d(XYZ)&=dX(YZ) +Xd(YZ)\cr &=dX(YZ) +X(dYZ+YdZ)\cr &=dXYZ+XdYZ+XY*dZ \cr}$$ – frank Jun 02 '16 at 13:30