In the derivation of the backpropagation algorithm in Neural Network Design by Hagan et al., we consider the derivative of the scalar-valued sample loss function $\hat{F}$ with respect to a vector of "sensitivities" $\mathbf{n}^{m}$ in layer $m$ of a fully connected neural network. We find a recurrence that allows us to express $\partial \hat{F} / \partial \mathbf{n}^{m}$ in terms of $\partial \hat{F} / \partial \mathbf{n}^{m+1}$ (the sensitivities of layer $m+1$).
The authors mention an application of "the chain rule in matrix form" to obtain the following:$${\partial \hat{F} \over \partial \mathbf{n}^m} = \left({\partial \mathbf{n}^{m+1} \over \partial \mathbf{n}^{m}}\right)^{T} {\partial \hat{F} \over \partial \mathbf{n}^{m+1}}$$
Earlier, the authors present the Jacobian $\partial \mathbf{n}^{m+1} / \partial \mathbf{n}^m$ using the "numerator layout":$$J = {\partial \mathbf{n}^{m+1} \over \partial \mathbf{n}^m} = \left[\begin{matrix}{\partial \mathbf{n}_1^{m+1} \over \partial \mathbf{n}_1^m} & \cdots & {\partial \mathbf{n}_1^{m+1} \over \partial \mathbf{n}_{S^m}^m} \\ \vdots & \ddots & \vdots \\ {\partial \mathbf{n}_{S^{m+1}}^{m+1} \over \partial \mathbf{n}_1^m} & \cdots & {\partial \mathbf{n}_{S^{m+1}}^{m+1} \over \partial \mathbf{n}_{S^m}^m} \end{matrix}\right]$$Here, $S^i$ is the dimension of the ith layer.
Given this Jacobian, why does a transpose and a left-multiplication appear in the recurrence expression for $\partial \hat{F} / \partial \mathbf{n}^{m}$? Why is the expression not equivalent to $D_{\mathbf{n}^{m+1}} \hat{F} \cdot J$?