I'm trying a Gradient Descent, (maybe Newtons Method?) for Linear Regression and getting wildly different solutions from the faster, more straight-forward linear equations, but can't find my mistake after hours of searching.
First I put numbers from 0 to 99 into an array, XX.Then I have integers m and b and set a second array, YY, where the nth YY is m times the nth XX + b. So its certain the graph is of a line. Then I tease out various averages and get the expected values for slope and intercept from the formulaic Linear Regression method.
For Gradient Descent, I essentially use a variation of Newton's method to solve $0-f(\vec{x_{n+1}})=\nabla f(\vec{x_n})\cdot d\vec{s}=\nabla f(\vec{x_n})\cdot (\vec{x_{n+1}}-\vec{x_n}) $. I want $d\vec{s}$ to be parallel to the gradient, so for its direction unit vector I use $\nabla f(\vec{x_n})/|\nabla(\vec{x_n})|$. Starting there I have $|d\vec{s}|=\frac{-f(\vec{x_n})}{(\nabla(\vec{x_n})^2/|\nabla f(\vec{x_n})|}=\frac{-f(\vec{x_n})}{|\nabla f(\vec{x_n})|}$
Finally $\vec{x_{n+1}}=\vec{x_n}-\frac{f(\vec{x_n})}{|\nabla f(\vec{x_n})|^2}\nabla f(\vec{x_n})$.
$\vec{x_n}=(m_n,b_n)$
$f(m_n,b_n)=\sum_{i=0}^N (YY_i-m_nXX_i-b_n)^2$
$\partial f/\partial m=\sum_{i=0}^N 2(YY_i-m_nXX_i-b_n)(-XX_i)$
$\partial f/\partial b=\sum_{i=0}^N 2(YY_i-m_nXX_i-b_n)(-1)$
I start with $(m_0,b_0)=(0,0)$. From there I end up with high initial values for slope and intercept, which then exponentially decay towards the expected values, only then to hop away from the expected result.
Below is a print out of what's going on. First column is $f(m_n,b_n)$, second column is $|\nabla f(m_n,b_n)|$, third is current $m_{n+1}$, fourth is $b_{n+1}$.
I've circled the spots where $f(m_n,b_n)$ starts to increase.
I've gone over the code a bunch of times. No problems there stick out, so I think I did the math wrong. For some reason the slope converges well, but the intercept does not. The last two lines are the m and b output by the Gradient descent, followed by the linear regression formula out put along with its r^2 value.
Long term I want to submit $f$ and $\nabla f$ as function arguments for some curve fitting, but Not sure whats wrong with this fairly easy problem yet.
EDIT: I noticed that the jumps away from minimum tend to happen when $|\nabla f|<1$. Suggests perhaps its overshooting.
