0

I'm trying to proove a common thing with a slight twist($A$ and $A^{T}$ are switched) $$ \nabla_A \operatorname{trace}( A^{T}BAC ) = BAC + B^T AC^T$$. I'm assuming all matrices here are square and of same size. I have already looked at the question Proof for the funky trace derivative : $d (\operatorname{trace} (ABA'C))$? , but in the upvoted answer I don't understand how the chain rule is supposed to work with matrices, especially I don't see why the derivative(and what kind of derivative should that be? It can't be $\nabla_A$ because that is only defined for scalar w.r.t matrix.) of $AB$ w.r.t $A$ should be $B^{T}$.

Therefore I tried to proove this on my own. I believe I'm making a mistake and can't seem to find it. Let's calculate the differential: $$d tr(A^{T}BAC) = tr(d(A^{T}BAC))=tr(d(A^{T})BAC+A^{T}Bd(A)C)\\ =tr(d(A^{T})BAC)+tr(A^{T}Bd(A)C)=tr((d(A))^{T}BAC)+tr(A^{T}Bd(A)C)\\=tr(C^{T}A^{T}B^{T}d(A))+tr(A^{T}Bd(A)C)=tr(C^{T}A^{T}B^{T}d(A))+tr(CA^{T}Bd(A))\\=tr((C^{T}A^{T}B^{T}+CA^{T}B)d(A))$$ I have found a rule to convert from differential to derivative form. $$dy=tr(AdX) \quad \Rightarrow \frac{\partial y}{\partial X}=A $$. If this rule is correct, I will get my answear transposed as opposed to what I was trying to prove. Can you please help me spot the mistake?

ExabytE
  • 105

1 Answers1

1

Let's use a colon (:) to denote the trace/Frobenius product, i.e. $$A:B={\rm tr}(A^TB)$$ Then the rule to convert between the differential to gradient is $$dy=A:dX \,\,\Longleftrightarrow\,\, \frac{\partial y}{\partial X}=A$$ In addition, two simple properties of the trace $$\eqalign{ {\rm tr}(AB) &= {\rm tr}(BA) \cr {\rm tr}(A^T) &= {\rm tr}(A) \cr }$$ give rise to many rules for rearranging the Frobenius product $$\eqalign{ A:BC &= B^TA:C \cr &= AC^T:B \cr &= BC:A \cr }$$ Applying these tools to the current problem yields $$\eqalign{ y &= A:BAC \cr\cr dy &= dA:BAC + A:B\,dA\,C \cr &= BAC:dA + B^TAC^T:dA \cr &= (BAC + B^TAC^T):dA \cr\cr \frac{\partial y}{\partial A} &= BAC + B^TAC^T \cr }$$

greg
  • 40,033
  • Thank you for your answer. Even if you provided a different proof that i wanted, I believe you have shown me the mistake. You used the conversion rule as follows $dy=A:dX ,,\Longleftrightarrow,, \frac{\partial y}{\partial X}=A$. But translating that to my notation, that means $dy=tr(A^{T}dX) ,,\Longleftrightarrow,, \frac{\partial y}{\partial X}=A$, which solves my problem. Is this correct? – ExabytE Oct 30 '17 at 11:05
  • Yes, that's an accurate translation of my result back into trace notation. The question of $A$ vs $A^T$ is referred to as the layout convention. Pick the one that you like and stick with it. – greg Oct 30 '17 at 12:10