1

In this post, thanks to greg, a solution for the gradient computations for the 2 hidden layers (or 3 layers in total per se) is presented. Now, if we want to generalize for the $L$ layers, then this becomes formidable at least to me.

How to obtain the gradient of \begin{align} L(W_1, W_2, \ldots, W_L) := \sum_{i=1}^N \|W_L \ g(W_{L-1} \cdots f(W_{1} x_i) ) - y_i \|_2^2 \ , \end{align} with respect to $W_{\ell}$ for all $\ell \in [1,\ldots,L]$?

I would highly appreciate your inputs/suggestions.

learning
  • 743

0 Answers0