1

In the deep learning book we have the standard RNN with these equations. It calculates various derivatives, including one for W.

I understand that:

  • $1 - {(h^{(t)})}^2 $ is coming from the derivative of $tanh$
  • $h^{(t-1)}$ is coming from the chain rule

What I don't understand:

  • where is the $diag$ coming from?
  • where is the transpose for $h^{(t-1)}$ is coming from?
  • why are they on the particular order they are in (apart from that this way the dimensions match). It feels like the gradient of L just got somehow between the parts of the derivate of $h^{(t)}$

Thank you

Kristof
  • 21
  • 1
    Related to https://math.stackexchange.com/questions/2970202/a-matrix-calculus-problem-in-backpropagation-encountered-when-studying-deep-lear/2974250#2974250 – user550103 Oct 14 '19 at 09:11

0 Answers0