Define some new vectors
$$\eqalign{
p &= W_1x &\implies dp = dW_1\,x \cr
f &= \sigma(p) &\implies df = (F-F^2)\,dp \cr
r &= W_2f &\implies dr = W_2\,df+dW_2\,f \cr
g &= \sigma(r) &\implies dg = (G-G^2)\,dr \cr
s &= W_3g-y &\implies ds = W_3\,dg+dW_3\,g \cr
}$$
where $F={\rm Diag}(f)$ and $G={\rm Diag}(g)$.
Write the loss function in terms of these new variables.
$$\eqalign{
L &= \|s\|^2_F = s:s \cr
}$$
where the colon is a convenient product notation for the trace, i.e.
$\,A:B = {\rm tr}(A^TB)$
Now calculate the differentials and desired gradients.
$$\eqalign{
dL
&= 2s:ds \cr
&= 2s:(W_3\,dg+dW_3\,g) \cr
}$$
Setting $dg=0$ yields our first gradient
$$\eqalign{
dL &= 2sg^T:dW_3 \cr
\frac{\partial L}{\partial W_3} &= 2sg^T
}$$
Now set $dW_3=0$ and continue on towards $W_2$.
$$\eqalign{
dL &= 2W_3^Ts:dg \cr
&= 2W_3^Ts:(G-G^2)\,dr \cr
&= 2(G-G^2)W_3^Ts:(W_2\,df+dW_2\,f) \cr
}$$
Setting $df=0$ yields our second gradient
$$\eqalign{
dL &= 2(G-G^2)W_3^Tsf^T:dW_2 \cr
\frac{\partial L}{\partial W_2} &= 2(G-G^2)W_3^Tsf^T
}$$
Now set $dW_2=0$ and continue on towards $W_1$.
$$\eqalign{
dL &= 2W_2^T(G-G^2)W_3^Ts:(F-F^2)\,dp \cr
&= 2(F-F^2)W_2^T(G-G^2)W_3^Ts:dW_1\,x \cr
&= 2(F-F^2)W_2^T(G-G^2)W_3^Tsx^T:dW_1 \cr
\frac{\partial L}{\partial W_1} &= 2(F-F^2)W_2^T(G-G^2)W_3^Tsx^T \cr
}$$
Actually we've only worked with the $i^{th}$ component of the loss function, i.e. $L_i$.
The full function or gradient is obtained by summing over all $N$ components.
$$\eqalign{
L_{total} &= \sum_{i=1}^N L_i \cr
\frac{\partial L_{total}}{\partial W_k}
&= \sum_{i=1}^N \frac{\partial L_i}{\partial W_k}
}$$
NB: In the derivation, $(x, y)$ were treated a single vectors, but in the summation they must be replaced by $(x_i, y_i)$