1

Currently I am learning the Linear Regression, in particular, the cost function. Here is the problem I am working on right now:

Suppose we have a training set with $m=3$ examples-points $(1,1), (2,2)$ and $(3,3)$.The hypothesis function is $h_\theta(x)=\theta_1x$ with a parameter $\theta_1$. The cost function is $J(\theta_1)=\frac{1}{2m}\sum_{i=0}^m(h_\theta(x^i)-y^i)^2$ . We need to find $J(0)$ , which is a relatively easy task if done manually(and I have already done it).

I am interested in doing it through a derivative. If I do it this way, I get $J'(\theta_1)=\frac{1}{2m}\sum_{i=1}^m(2(h_{\theta_1}x^i-y^i))h_\theta$ (If hopefully I haven't done any mistakes) Then, to find a minimum value(values) of $\theta_1$ all I need to do is to solve $J'(\theta_1)=0$. That's where I have a few questions.

Can we assume that the sum will never be zero? If so, when I solve this equation I find that the only way for the equation to be zero is for $h_{\theta_1}$ to be zero, which doesn't seem right, or for $2(h_{\theta_1}x^i-y^i)$ to be zero for any pair of $(x,y)$. That is, $\theta_1=1$ . Is my reasoning correct?

Alex.Kh
  • 159
  • First of all, why is your cost function not $\frac{1}{\color{red}m}\sum\limits_{i=\color{red}1}^m(h_\theta(x^i)-y^i)^2$? – callculus42 Jul 22 '19 at 13:06
  • In this case, it doesn't matter that much whether it is $2m$ rather than $m$. The factor of $\frac{1}{2}$ does not matter when optimizing. You can look at a proper answer here – Alex.Kh Jul 22 '19 at 13:13
  • And thx for pointing out the mistake. I meant $\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^i)-y^i)^2$ instead of $\frac{1}{2m}\sum_{i=0}^m(h_\theta(x^i)-y^i)^2$ – Alex.Kh Jul 22 '19 at 13:16
  • I´ve regard that in my answer. – callculus42 Jul 22 '19 at 13:46

1 Answers1

1

You know that $h_\theta(x)=\theta_1x$. Thus the cost function is

$$J(\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^i)-y^i)^2=\frac{1}{2m}\sum_{i=1}^m(\theta_1x^i -y^i)^2$$

Setting the first derivative equal to $0$. For the derivative we use the chain rule.

$$J^{'}(\theta_1)=\frac{1}{m}\sum_{i=1}^m(\theta_1x^i -y^i)\cdot x^i=0$$

I omit the factor $\frac1m$. Each summand gets it´s own sigma sign.

$$\sum_{i=1}^m\theta_1(x^i)^2 -\sum_{i=1}^my^i\cdot x^i=0$$

$\theta_1$ can be factored out since it does not depend on index $i$

$$\theta_1\cdot \sum_{i=1}^m(x^i)^2 -\sum_{i=1}^my^i\cdot x^i=0$$

$$\theta_1\cdot \sum_{i=1}^m(x^i)^2 =\sum_{i=1}^my^i\cdot x^i$$

$$\hat \theta_1=\frac{\sum\limits_{i=1}^my^i\cdot x^i}{\sum\limits_{i=1}^m(x^i)^2}$$

We can insert your values.

$$\hat \theta_1=\frac{ 1\cdot 1+2\cdot 2+3\cdot 3}{ 1^2+2^2+3^2}=1$$

In your case the regression line is $h_0(x)=1\cdot x$

Are the steps compehensible and do they answer your questions? If not feel free to ask.

callculus42
  • 31,012
  • Thank you for the answer! I just want to clariy something. Was I wrong when I didn't substitute $h_{\theta_1}$ with $\theta_1x$ while finding the derrivative? – Alex.Kh Jul 22 '19 at 14:02
  • You´re welcome. If you want to obtain an explicit formula you have to use the funktion. Otherwise you get a general solution for an arbitrary function. – callculus42 Jul 22 '19 at 14:08
  • You can use $h_{\theta}(x)$ for the derivative. It is $$J'(\theta_1)=\frac{1}{2m}\sum_{i=1}^m(2(h_{\theta}(x^i)-y^i))\cdot h_{\theta}^{'}(x^i)$$. Now you can replace the funktion at it´s derivative, with $h_{\theta}(x^i)=\theta_1x^i$ – callculus42 Jul 22 '19 at 14:23
  • Oh, now I see. In my solution I actually had to write $h'\theta$ instead of $h\theta$ – Alex.Kh Jul 22 '19 at 14:32
  • Yes, that´s right. – callculus42 Jul 22 '19 at 14:33