Let $f: \mathbb{R}^{m\times n} \mapsto \mathbb{R}$ be a mapping from an $m$-by-$n$ matrix to the real numbers. Defining the derivative of $f$ to be:
$$\nabla_A f(A) = \begin{pmatrix} \frac{\partial f}{\partial A_{11}} & \ldots & \frac{\partial f}{\partial A_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial A_{m1}} & \ldots & \frac{\partial f}{\partial A_{mn}}\end{pmatrix},
$$
I am given that
$$\nabla_A \mathrm{tr}(ABA^T C) = CAB + C^T A B^T.
$$
This was stated in CS229 - Machine Learning without proof. How would I prove this? Is there an easy way (the notes say that it should be simple)?
Edit: I seem to have found a counter-example. If I let $A = B = C = I$, the identity matrix, the left hand side gives me $I$ and the right-hand-side gievs me $2I$. However, throughout the course I've used this rule many times without fail. Have I missed something or is the formula typo-ed?
Also, $ABA^TC$ requires $A$ to be $m\times n$, $B$ to be $n\times n$, $C$ to be $m\times n$ for $ABA^TC$ to be $m\times n$. The right-hand-side has the product $CAB$ which doesn't make sense under these dimensions.