0

Suppose $$ W \in \mathbb{R}^{m\times n}, \quad h \in \mathbb{R}^{n} $$ What is $\nabla_WWh$ ?

In other words, how to calculate the gradient of $ \ Wh \ $ with respect to $ \ W \ $ ?

BioCoder
  • 945
  • Intuitively, the answer of my question should be $h$ just like what you said. But in the book Deep Learning, the answer is $h^T$. Do you know why? (See Algorithm 6.4, page 209, http://www.deeplearningbook.org/contents/mlp.html) – BioCoder Oct 24 '18 at 09:10
  • I simplified the equation for convenience. The original equation(in the pseudo codes of Algorithm 6.4) in the book is: $ \nabla_{W^{(k)}}J = gh^{(k-1)T}+\lambda\nabla_{W^{(k)}}{\Omega(\theta)}$. Please note the superscript $T$ – BioCoder Oct 24 '18 at 09:54
  • Hi, Rahul. Have you figured it out? – BioCoder Oct 24 '18 at 14:04

1 Answers1

3

One straightforward option is to utilize the $\textrm{vec}$ operator \begin{align} \textrm{vec}\left(AXB\right) = \left(B^T \otimes A\right) \textrm{vec}\left(X\right). \end{align}

So, \begin{align} \textrm{vec}\left(Wh\right) &= \textrm{vec}\left(IWh\right) \\ &= \left(h^T \otimes I\right) \textrm{vec}\left(W\right). \end{align}

Then, take the derivative with respect to $W$, i.e., \begin{align} \frac{\partial}{\partial \textrm{vec}\left(W\right)} \left\{\left(h^T \otimes I\right) \textrm{vec}\left(W\right) \right\} &= h^T \otimes I \ , \end{align} where $I$ is an Identity matrix of appropriate size.

user550103
  • 2,773
  • Thanks for you answer, that's a professional one! Another question: do I need to unvec the result and how? – BioCoder Oct 24 '18 at 07:02
  • 1
    no, you don't need to unvec (the result is already in a matrix form). – user550103 Oct 24 '18 at 07:26
  • But is the Hadamard product of $h^T \otimes I$ defined? – BioCoder Oct 24 '18 at 07:32
  • $\otimes$ is a Kronecker product: https://en.wikipedia.org/wiki/Kronecker_product – user550103 Oct 24 '18 at 07:35
  • Oh, I see. But is $ \ \nabla_WWh \ $ the same as $ \ \nabla_{vec(W)} {vec(Wh)} \ $ ? – BioCoder Oct 24 '18 at 07:43
  • 1
    Yes, see, e.g., page 4 of http://www4.ncsu.edu/~pfackler/MatCalc.pdf – user550103 Oct 24 '18 at 08:06
  • OK, I got it. Actually, I am trying to get rid of $ \ \otimes I \ $, because the result will be used in another place. Is $ \ h^T \otimes I \ $ equivalent to $ \ h^T \ $? And how to prove it? – BioCoder Oct 24 '18 at 08:18
  • if you want to avoid Kronecker products then i think you have to consider Rahul's comments. – user550103 Oct 24 '18 at 08:23
  • Well, it's hard to explain here because we are not talking face to face. But still thanks a lot for your answers and patience. – BioCoder Oct 24 '18 at 08:33
  • Hi, I have raised a new question to explain this problem in detail. Hope you are still interested in it and give a professional answer. https://math.stackexchange.com/questions/2970202/a-matrix-calculus-problem-in-backpropagation-encountered-when-studying-deep-lear – BioCoder Oct 25 '18 at 06:41