Bad Convergence, Gradient Descent

Question

I'm trying a Gradient Descent, (maybe Newtons Method?) for Linear Regression and getting wildly different solutions from the faster, more straight-forward linear equations, but can't find my mistake after hours of searching.

First I put numbers from 0 to 99 into an array, XX.Then I have integers m and b and set a second array, YY, where the nth YY is m times the nth XX + b. So its certain the graph is of a line. Then I tease out various averages and get the expected values for slope and intercept from the formulaic Linear Regression method.

For Gradient Descent, I essentially use a variation of Newton's method to solve $0-f(\vec{x_{n+1}})=\nabla f(\vec{x_n})\cdot d\vec{s}=\nabla f(\vec{x_n})\cdot (\vec{x_{n+1}}-\vec{x_n}) $. I want $d\vec{s}$ to be parallel to the gradient, so for its direction unit vector I use $\nabla f(\vec{x_n})/|\nabla(\vec{x_n})|$. Starting there I have $|d\vec{s}|=\frac{-f(\vec{x_n})}{(\nabla(\vec{x_n})^2/|\nabla f(\vec{x_n})|}=\frac{-f(\vec{x_n})}{|\nabla f(\vec{x_n})|}$

Finally $\vec{x_{n+1}}=\vec{x_n}-\frac{f(\vec{x_n})}{|\nabla f(\vec{x_n})|^2}\nabla f(\vec{x_n})$.

$\vec{x_n}=(m_n,b_n)$

$f(m_n,b_n)=\sum_{i=0}^N (YY_i-m_nXX_i-b_n)^2$

$\partial f/\partial m=\sum_{i=0}^N 2(YY_i-m_nXX_i-b_n)(-XX_i)$

$\partial f/\partial b=\sum_{i=0}^N 2(YY_i-m_nXX_i-b_n)(-1)$

I start with $(m_0,b_0)=(0,0)$. From there I end up with high initial values for slope and intercept, which then exponentially decay towards the expected values, only then to hop away from the expected result.

Below is a print out of what's going on. First column is $f(m_n,b_n)$, second column is $|\nabla f(m_n,b_n)|$, third is current $m_{n+1}$, fourth is $b_{n+1}$.

I've circled the spots where $f(m_n,b_n)$ starts to increase.

I've gone over the code a bunch of times. No problems there stick out, so I think I did the math wrong. For some reason the slope converges well, but the intercept does not. The last two lines are the m and b output by the Gradient descent, followed by the linear regression formula out put along with its r^2 value.

Long term I want to submit $f$ and $\nabla f$ as function arguments for some curve fitting, but Not sure whats wrong with this fairly easy problem yet.

EDIT: I noticed that the jumps away from minimum tend to happen when $|\nabla f|<1$. Suggests perhaps its overshooting.

As a test, what step do you get when you are exactly at the solution? — user619894, May 02 '22 at 16:29
Division by zero error when slope and intercept are initialized to exactly the right values. Keeping intercept at correct value and subtracting 0.0001 from the slope avoids the division by zero error. At each iteration, f remains very small and typically decreases by a factor of about 4 at each step with and occasional jump upwards by a factor less than 10.So for a few steps in a raw f decreases by a factor of 16 with a jump up by a multiple less than ten, then down ultimately by a factor of about 16. — TurlocTheRed, May 02 '22 at 16:58
I guess $|\nabla f|$ approaches zero faster than $f$, so it ends up kicking m and b past the equilibrium point? Especially given the input is from an actual line, so its expected numerator and denominator both approach zero. Hmm — TurlocTheRed, May 02 '22 at 16:58
Have you considered using this formula for Newton's method https://math.stackexchange.com/questions/457903/newtons-method-in-higher-dimensions-explained — Joe, May 02 '22 at 22:21
Based on your description you may have a bug in determining the optimal step size. — user619894, May 03 '22 at 04:22
Note that newton iteration works for finding the root (=zero) of an equation, if there are small numerical issues with $f$ in might have a minimum but no zero. — user619894, May 03 '22 at 13:14
@Joe, that's exactly what I needed! as user619894 points out, the minimum might not be zero. I unambiguously need f'=0 though, so tweaking the algorithm I'm getting convergence without overshooting. A bit slower than I'd expect,but does have the expected exponential decay to the minimum. Thanks all! — TurlocTheRed, May 03 '22 at 15:17

Bad Convergence, Gradient Descent

0 Answers0