0

Given $m \times n$ matrix $\bf X$ and $m \times p$ matrix $\bf Y$, define the loss function in $n \times p$ matrix $\bf R$

$$ \operatorname{Loss} ({\bf R}) := \| \mathbf{X} \mathbf{R} - \mathbf{Y} \|_F^2 $$

where the square of the Frobenius norm of $n \times m$ matrix $\mathbf{A}$ is defined as

$$ \| \mathbf{A} \|_F^2 = \sum_{i=1}^n \sum_{j=1}^m a_{ij}^2 $$

I have to compute the gradient $\nabla_{{\bf R}} \operatorname{Loss}$. My source says:

$$ \nabla_{{\bf R}} \operatorname{Loss} ({\bf R}) = \dfrac2m \mathbf{X}^T (\mathbf{X} \mathbf{R} - \mathbf{Y} ) $$

but I am not sure how to obtain this result.

  • 1
    What have you tried so far? A good tip for taking matrix derivatives that I used when I wasn't as comfortable with matrix calculus was to simply take the derivative with respect to one element of the matrix (say $R_{ij}$), and write everything in summation form. Also, I think you might be missing a $1/m$ factor in your original loss expression. – chang_trenton Dec 27 '20 at 03:47
  • 1
    Also, if you're still stuck, you can find the derivation here: https://math.stackexchange.com/questions/2128462/derivative-of-squared-frobenius-norm-of-a-matrix – chang_trenton Dec 27 '20 at 03:50
  • Do you agree with my edits? Which source is that? – Rodrigo de Azevedo Mar 31 '23 at 07:20

1 Answers1

1

Use a colon to denote the trace/Frobenius product $$\eqalign{ A:B \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; {\rm Tr}(A^TB) \;=\; {\rm Tr}(AB^T) \\ }$$ then the derivative/gradient is fairly simple to calculate $$\eqalign{ W &= XR-Y \\ dW &= X\,dR\\ Loss &= \big\|W\big\|^2_F \,\;=\; W:W \\ d\,Loss &= 2W:dW \;=\; 2W:X\,dR \;=\; 2X^TW\color{red}{:dR} \\ \frac{\partial\,Loss}{\color{red}{\partial R}} &= 2X^TW \;=\; 2X^T(XR-Y) \\ }$$ So the answer from your source is off by a factor of $\left(\frac 1m\right)$

greg
  • 40,033