I'm trying to proove a common thing with a slight twist($A$ and $A^{T}$ are switched) $$ \nabla_A \operatorname{trace}( A^{T}BAC ) = BAC + B^T AC^T$$. I'm assuming all matrices here are square and of same size. I have already looked at the question Proof for the funky trace derivative : $d (\operatorname{trace} (ABA'C))$? , but in the upvoted answer I don't understand how the chain rule is supposed to work with matrices, especially I don't see why the derivative(and what kind of derivative should that be? It can't be $\nabla_A$ because that is only defined for scalar w.r.t matrix.) of $AB$ w.r.t $A$ should be $B^{T}$.
Therefore I tried to proove this on my own. I believe I'm making a mistake and can't seem to find it. Let's calculate the differential: $$d tr(A^{T}BAC) = tr(d(A^{T}BAC))=tr(d(A^{T})BAC+A^{T}Bd(A)C)\\ =tr(d(A^{T})BAC)+tr(A^{T}Bd(A)C)=tr((d(A))^{T}BAC)+tr(A^{T}Bd(A)C)\\=tr(C^{T}A^{T}B^{T}d(A))+tr(A^{T}Bd(A)C)=tr(C^{T}A^{T}B^{T}d(A))+tr(CA^{T}Bd(A))\\=tr((C^{T}A^{T}B^{T}+CA^{T}B)d(A))$$ I have found a rule to convert from differential to derivative form. $$dy=tr(AdX) \quad \Rightarrow \frac{\partial y}{\partial X}=A $$. If this rule is correct, I will get my answear transposed as opposed to what I was trying to prove. Can you please help me spot the mistake?