Extending this question. How to obtain the gradient of ($\ell1$ penalized) \begin{align} L(W_1, W_2, W_3) := \sum_{i=1}^N \| W_3 \ g\left(W_2 \ f\left(W_1 x_i \right) \right) - y_i \|_2^2 + \lambda \left( \| W_3\|_1 + \| W_2\|_1 + \| W_1\|_1\right)\ , \end{align} with respect to $W_1$, $W_2$, and $W_3$?
The definition of $x_i \in \mathbb{R}^n$, $W_1 \in \mathbb{R}^{m \times n}$, $W_2 \in \mathbb{R}^{p \times m}$, $W_3 \in \mathbb{R}^{q \times p}$, and $y_i \in \mathbb{R}^q$, and $f(z) = g(z) = \frac{1}{1 + \exp(-z)}$.
EDIT:
The gradient of the first $\ell2$ norm of the cost function is given in the link. But how to address it with $\ell1$ regularization such that one can find the optimal weights.
Thank you so much in advance for your help