I know that the gradient of $X \mapsto \mbox{Tr}(XA)$ is $A^T$. However, how does this change if we had a scenario where $A$ and $X$ are swapped. Is the gradient $X \mapsto \mbox{Tr}(AX)$ the same?
Also, how does this extend if we have more matrices? We can just assume everything before our "$X$" is $A$, correct? For example, $X \mapsto\mbox{Tr}\left(U^T V X\right)$. We can assume this is similar to the above where $U^TV$ is our "$A$" matrix, right?