Is Newton's method a descending algorithm

Question

Consider the following optimal problem $$ \min_{\mathbf{x}} f(\mathbf{x}) $$ where $f$ is continuously differentiable and positive definite. The Newton iterative scheme is $$ \mathbf{x}_{k+1}=\mathbf{x}_k-(\nabla^2 f(\mathbf{x}_k))^{-1}\nabla f(\mathbf{x}_k) $$ the Newton direction is $-(\nabla^2 f(\mathbf{x}_k))^{-1}\nabla f(\mathbf{x}_k)$. And we have $$ \nabla^Tf(\mathbf{x}_k)\mathbf{d}_k=-\nabla^Tf(\mathbf{x}_k)(\nabla^2 f(\mathbf{x}_k))^{-1}\nabla f(\mathbf{x}_k) < 0 $$ so Newton direction is descending direction in theory, but in numerical experience. It seems not right, the Matlab code is

function x=pure_newton(f,g,h,x0,epsilon)
    if (nargin<5)
        epsilon=1e-5;
    end
    x=x0;
    gval=g(x);
    hval=h(x);
    iter=0;
    while ((norm(gval)>epsilon)&&(iter<10000))
       iter=iter+1;
       x=x-hval\gval;
       fprintf('iter= %2d f(x)=%10.10f\n',iter,f(x))
       gval=g(x);
       hval=h(x);
    end
    if (iter==10000)
        fprintf('did not converge')
    end
end

The example I used is $$ \min_{x_1,x_2}\sqrt{1+x_1^2}+\sqrt{1+x_2^2} $$ which can be found in 《Introduction to Nonlinear Optimization Theory, Algorithms, and Applications with MATLAB》 page 87. If choose $(10, 10)^T$ as initial point, algorithm will not convergence. Here is running result

>> pure_newton
iter=  1 f(x)=2000.0009999997
iter=  2 f(x)=1999999999.9999990463
iter=  3 f(x)=1999999999999997277796040704.0000000000
iter=  4 f(x)=1999999999999992260078759655214123233773013273847240743173127608710117986469412864.0000000000
iter=  5 f(x)=       Inf

My question is newton direction is descending direction,and $\{f(x_k)\}$ is descending sequence,but it seems not true in practice.Thanks in advance for helping

@martini But if I choose $(1,1)^T$ as initial point,the algorithm will convergence — TiantianHe, Mar 04 '22 at 15:11
In each step, Newton's method minimizes a quadratic approximation to your function. But if this quadratic approximation is bad, it's possible that a step will actually increase the value. — angryavian, Mar 04 '22 at 16:26

Vítězslav Štembera · Answer 1 · 2022-03-06T03:01:00.497

Newton's method gives descreasing direction, when the inverse of the Hessian matrix is positive definite (being in a point with nonzero gradient) due to $$ [-(\nabla f({\bf{x}}_k))]^{\text{T}}(\nabla^2 f({\bf{x}}_k))^{-1}[-(\nabla f({\bf{x}}_k))]> 0 $$ where $(\nabla^2 f({\bf{x}}_k))^{-1}[-(\nabla f({\bf{x}}_k))]$ is the direction of the Newton's method to the new iterate ${\bf{x}}_{k+1}$. The formula says that this direction is also decreasing as the minus gradient $[-(\nabla f({\bf{x}}_k))]$.

However, in order to get always descreasing new iterate (not just descreasing direction), you need to perform the line search in every iteration! With constant $\alpha_k=1$ you can never be sure of that. $$ \mathbf{x}_{k+1}=\mathbf{x}_k-\alpha_k(\nabla^2 f(\mathbf{x}_k))^{-1}\nabla f(\mathbf{x}_k) $$ Newton's method with the line search in every iteration is even globally convergent (if Hessian is known to be positive definite)!

https://pages.mtu.edu/~msgocken/ma5630spring2003/lectures/lines/lines.pdf

José C Ferreira · Answer 2 · 2022-03-06T00:31:27.703

Trying to give some glimpse to your question. Let us consider the minimization problem $$0=g(\textbf{a})=\min_{\textbf{x}\in A}{g(\textbf{x})},\qquad {g(\textbf{x})}=\frac{1}{2}\|{\bf f}({\bf x})\|^2,$$ to some continuously differentiable function $\textbf{f}:A\to \mathbb{R}^p$, where $A$ is an open set of $\mathbb{R}^m$ containing $\textbf{a}$. Now, if you have some differentiable curve $\textbf{u}:(a,b)\to A$, you can apply the chain rule to obtain $$\frac{d\, g({\bf u}(t))}{dt}= \left\langle {\bf u}'(t), \nabla g({\bf u}(t))\right\rangle= \left\langle {\bf u}'(t),[J{\bf f}({\bf u}(t))]^*{\bf f}({\bf u}(t))\right\rangle=\left\langle J{\bf f}({\bf u}(t)){\bf u}'(t),{\bf f}({\bf u}(t))\right\rangle,$$ in which $\langle \cdot,\cdot\rangle$ denotes the inner product.

If this curve satisfies the initial value problem (IVP) $$\left\{\begin{array}{rrl}J{\bf f}({\bf u}(t)){\bf u}'(t)&=&-\alpha {\bf f}({\bf u}(t))\\ {\bf u}(0)&=&{\bf u}_0\end{array}\right.,$$ to some $\alpha>0$, then you find that $$\frac{d\, g({\bf u}(t))}{dt}= -2\alpha g({\bf u}(t))\leq 0.$$ This means that, ${g(\textbf{u}}(t))$ is a non increasing function. This also means that, if $g(\textbf{u}(t))> 0$, then $g(\textbf{u}(t+h))<g(\textbf{u}(t))$, when $0<h<h_t$, to some $h_t>0$ close enough to $0$. Please see Picard Theorem and Lyapunov stabylity theory.

If $m=p$ and $J{\bf f}({\bf x})$ has bounded inverse matrix, to all $\textbf{x}\in A$, the previous IVP becomes
$$\left\{\begin{array}{lll}{\bf u}'(t)&=&-\alpha \left[J{\bf f}({\bf u}(t))\right]^{-1}{\bf f}({\bf u}(t))\\ {\bf u}(0)&=&{\bf u}_0\end{array}\right..$$

We can use the Euler method $$\left\{\begin{array}{rll}J{\bf f}({\bf u}_j) {\bf w}_j&=&-\alpha_j {\bf f}({\bf u}_j)\\ {\bf u}_{j+1}&=&{\bf u}_j+{\bf w}_j\end{array}\right.,$$ to solve the previous IVP numerically, where $\textbf{u}_0=u(0)$, $t_{j+1}=t_j+h_j$, $0<h_j$, ${\bf u}(t_{j+1}) \approx {\bf u}_{j+1}$ and $\alpha_j=\alpha h_j$.

We call $\alpha_j$ as the tuning parameter, as we call it in gradient descent method, and you should be carefully to choose it to have $g({\bf u}_{j+1})<g({\bf u}_j)$. Otherwise you can has a "bad" approximation ${\bf u}(t_{j+1}) \approx {\bf u}_{j+1}$ in which $g({\bf u}_{j+1})>g({\bf u}_j)$.

But, when the things works well, $\alpha_j$ can be 1, to $j$ big enough.

If we can choose $\textbf{u}_0$ close enough to $\textbf{a}$, then we can take $\alpha_j=1$, and find Newton method to solve the equation $\textbf{f}(\textbf{x})=\textbf{0}$ as a particular case.

Remark: You can rewrite this text using $$g(\textbf{x})=\frac{1}{2}\|\nabla \textbf{f}(\textbf{x})\|^2$$ instead, when you are working with the implication $$ \min_{\textbf{x}\in A}{f(\textbf{x})}=f(\textbf{a})\Longrightarrow \nabla \textbf{f}(\textbf{a})=\textbf{0}.$$ And with a carefully choose of $\alpha_j$ can make the sequence $\left(\|\nabla \textbf{f}(\textbf{u}_j)\|^2\right)$ a decreasing sequence.

I found this thread on SearchOnMath. It can be helpful on this subject https://math.stackexchange.com/questions/336292/graphically%2C-what-is-positive-semidefinite-ness%3F, because Newton's method on optimization problems can find sadle points or a local maximum. — José C Ferreira, Mar 09 '22 at 12:44

Is Newton's method a descending algorithm

2 Answers2

Linked