Question:
We have a function which takes a two-dimensional input $x = (x_1, x_2)$ and has two parameters $w = (w_1, w_2)$ given by
$$ f(x, w) = \sigma(\sigma(x_1 w_1) w_2 + x_2) $$
where
$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$
We use backpropagation to estimate the right parameter values. We start by setting both the parameters to 0. Assume that we are given a training point $x_1 = 1, x_2 = 0, y = 5$. Given this information, answer the next two questions.
What is the value of $\frac{\partial f}{\partial w_2}$?
Solution:
Write $\sigma(x_1 w_1) w_2 + x_2$ as $o_2$ and $x_1 w_1$ as $o_1$.
$$ \frac{\partial f}{\partial w_2} = \frac{\partial f}{\partial o_2} \cdot \frac{\partial o_2}{\partial w_2} $$
$$ \frac{\partial f}{\partial w_2} = \sigma(o_2)(1 − \sigma(o_2)) \cdot \sigma(o_1) \quad \text{# Need to understand here} $$
$$ \frac{\partial f}{\partial w_2} = 0.5 \cdot 0.5 \cdot 0.5 = 0.125 $$
Can someone help me understand the solution? What is the $f$ equation, which is partially differentiated with respect to $o_2$ to get $\sigma(o_2)(1 − \sigma(o_2))$?
And I don't understand where the $0.5$ came from.
Please help.