3

Question:

We have a function which takes a two-dimensional input $x = (x_1, x_2)$ and has two parameters $w = (w_1, w_2)$ given by

$$ f(x, w) = \sigma(\sigma(x_1 w_1) w_2 + x_2) $$

where

$$ \sigma(x) = \frac{1}{1 + e^{-x}} $$

We use backpropagation to estimate the right parameter values. We start by setting both the parameters to 0. Assume that we are given a training point $x_1 = 1, x_2 = 0, y = 5$. Given this information, answer the next two questions.

What is the value of $\frac{\partial f}{\partial w_2}$?


Solution:

Write $\sigma(x_1 w_1) w_2 + x_2$ as $o_2$ and $x_1 w_1$ as $o_1$.

$$ \frac{\partial f}{\partial w_2} = \frac{\partial f}{\partial o_2} \cdot \frac{\partial o_2}{\partial w_2} $$

$$ \frac{\partial f}{\partial w_2} = \sigma(o_2)(1 − \sigma(o_2)) \cdot \sigma(o_1) \quad \text{# Need to understand here} $$

$$ \frac{\partial f}{\partial w_2} = 0.5 \cdot 0.5 \cdot 0.5 = 0.125 $$


Can someone help me understand the solution? What is the $f$ equation, which is partially differentiated with respect to $o_2$ to get $\sigma(o_2)(1 − \sigma(o_2))$?

And I don't understand where the $0.5$ came from.

Please help.

Dang Dang
  • 320

1 Answers1

2

The sigmoid function $\sigma(x)=[1+\exp(-x)]^{-1}$ has the following derivative: $$ \frac{\partial \sigma}{\partial x} = \sigma(x)[1-\sigma(x)] \tag{1} $$ Let us now define \begin{align} g(x,w) &= \sigma(x_1w_1)w_2 + x_2 \\ f(x,w) &= \sigma(g(x,w)) = \sigma( \sigma(x_1w_1)w_2 + x_2 ) \end{align} Notice that $g$ is linear in $w_2$, so that: $$ \frac{\partial g}{\partial w_2} = \sigma(x_1 w_1) \tag{2} $$ Using the chain rule, we get \begin{align} \frac{\partial f}{\partial w_2} &= \frac{\partial \sigma}{\partial g} \frac{\partial g}{\partial w_2} \\[3mm] &= \underbrace{\sigma(g(x,w))[1 - \sigma(g(x,w))]}_{\text{From} (1)} \;\underbrace{\sigma(x_1 w_1)}_{\text{From} (2)} \\ &= \sigma(o_2)[1 - \sigma(o_2)] \sigma(o_1) \end{align} where the last step uses $g(x,w)=: o_2$ and $x_1w_1=: o_1$.

The only confusing part is probably the derivative of the sigmoid function (which I linked to above). My favourite proof is this one by Hans Lundmark.


As for where the $0.5$ comes from, since $w_1=w_2=0$, $x_1=1$, and $x_2=0$, we have \begin{align} o_1 &= x_1 w_1 = 0 \\ \sigma(o_1) &= \sigma(0) = [1 + \exp(0)]^{-1} = 2^{-1} = 0.5 \\ o_2 &= g(x,w) = \sigma(0)0 + 0 = 0\\ \sigma(o_2) &= \sigma(0) = 0.5 \\[2mm] \therefore\;\;\; \frac{\partial f}{\partial w_2} &= \sigma(0)[1 - \sigma(0)] \sigma(0) = 0.5[1-0.5]0.5= 0.5^3 \end{align}

user3658307
  • 10,843