I currently face a lot of questions which need to use chain rule in matrix calculus. But it's different from scalar cases, where I don't need to worry the order of multiplication there. But in matrix calculus, I must be very careful about whether it is $AB$ or $BA$ or sometimes even need transpose?
For example, Let $X$ be $m\times n$. I want to ger the derivative of the Frobenius norm $\|X\|$ with respect to the matrix $X$.
Let $X^i_j$ denote the row $i$, column $j$ entry of $X$.
The Frobenius norm of a matrix $X$ is defined as:
$$\|X\| = \sqrt{\sum_{p, q} (X^p_q)^2}$$
To find $\frac{\partial \|X\|}{\partial X^i_j}$, we use the chain rule:
$$\frac{\partial \|X\|}{\partial X^i_j} = \frac{\partial \|X\|}{\partial \left(\sum_{p, q} (X^p_q)^2\right)} \cdot \frac{\partial \left(\sum_{p, q} (X^p_q)^2\right)}{\partial X^i_j}$$.
My Question: How should I know it is the first factor times the second factor, rather than the second factor times the first factor? I know it doesn't matter here. But when the order matters? And how should I decide the order?