Proof that Newton Raphson method has quadratic convergence

Question

I've googled this and I've seen different types of proofs but they all use notations that I don't understand.

But first of all, I need to understand what quadratic convergence means, I read that it has to do with the speed of an algorithm. Is this correct?

Ok, so I know that this is the Newton-Raphson method:

$$x_{n+1}=x_n-\dfrac{f(x_n)}{f'(x_n)}$$

How do I prove that it converges?

Thanks.

Christian Blatter · Answer 1 · 2015-03-04T19:18:05.977

The method converges under suitable hypotheses. Assume that you have determined by whatever means an interval $[a,b]$ with $$f(a)<0<f(b);\qquad f'(x)>0, \quad f''(x)>0\quad(a<x<b)$$ (or similar, but with different signs of $f$, $f'$, and $f''$). Then $f$ has exactly one zero $\xi\in\ ]a,b[\ $. Furthermore it is obvious from looking at a figure, resp., the convexity properties of $f$, that $$x_0:=b,\qquad x_{n+1}:=x_n-{f(x_n)\over f'(x_n)}\quad(n\geq0)\tag{1}$$ produces a monotonically decreasing sequence of points $x_n>\xi$. It follows that the $x_n$ converge to some $\xi'\in[\xi,x_0]$. Letting $n\to\infty$ in $(1)$ implies $f(\xi')=0$, whence $\xi'=\xi$.

In order to analyze the speed of convergence we invoke Taylor's theorem: For each $n\geq0$ there is an $x^*\in[\xi,x_n]$ with $$0=f(\xi)=f(x_n)+f'(x_n)(\xi-x_n)+{f''(x^*)\over 2!}(\xi-x_n)^2\ .$$ This implies, by definition of $x_{n+1}$, that $$x_{n+1}-\xi={f''(x^*)\over 2 f'(x_n)}(x_n-\xi)^2\ .$$ Here for large $n$ the first factor on the right hand side is approximately equal to $$C:={f''(\xi)\over 2 f'(\xi)}\ .$$ This means that for large $n$ we have approximately $$x_{n+1}-\xi\doteq C(x_n-\xi)^2\qquad(n\gg1)\ .$$ Qualitatively this means that with each Newton step the number of correct decimals is about doubled. That is what is meant by "quadratic convergence".

score 5 · Answer 2 · edited Sep 11 '18 at 23:32

Let $f$ be twice continuous differentiable on some interval $(a, b)$. Assume that $f(c) = 0$ where $a < c < b$ and that $f'(c) \neq 0$. Then there exists $0 < \epsilon < \min(c - a, b-c)$ with the following property: pick any $x_0 \in (c- \epsilon, c + \epsilon)$ and define iteratively$$x_{n+1} = x_n - {{f(x_n)}\over{f'(x_n)}},\text{ }n \ge 0.\tag*{$(1)$}$$We have that $\{x_n\}_{n=0}^\infty$ converges to $c$ in the following fashion:$$\left|x_{n+1} - c\right| \le M\left|x_n - c\right|^2 \text{ for all }n \ge 0,\tag*{$(2)$}$$where $M$ is some constant.

We may assume $c = 0$, and $\epsilon$ small enough that$$\left|f'(x)\right| > {{\left|f'(0)\right|}\over2}$$ when $\left|x\right| < \epsilon$ for some $B \in \mathbb{R}^+$. Then by Taylor's Theorem,$$f(x_n) - xf'(x_n) = {{x_n^2}\over2} f''(y_n)$$for some $y_n$ between (fix this->)$)$ and $x_n$. Thus, for $\left|x_n\right| < \epsilon$, we have$$\left|x_{n+1}\right| = \left|-x_{n+1}\right|$$$$=\left| {{f(x_n)}\over{f'(x_n)}} - x_n\right|$$$$= {1\over{\left|f'(x_n)\right|}} \cdot \left| f(x_n) - x_n f'(x_n)\right|$$$$= {1\over{\left|f'(x_n)\right|}} \cdot \left| {{x_n^2}\over2} f''(y_n)\right|$$$$\le {2\over{f'(0)}} \cdot {B\over2} x_n^2.$$What we are doing is taking $x_{n+1}$ to be the point where the tangent line to the graph of $f$ at $x_n$ hits the $x$-axis. The inequality we proved shows that for $\left|x_n\right| < \epsilon$, we have$$\left| {{x_{n+1}}\over{x_n}}\right| < Mx_n,$$so if$$\left| x_n\right| < \min\left( \epsilon, {1\over{2M}}\right),$$we have$$\left|x_{n+1}\right| < {{\left|x_n\right|}\over2}.$$Thus,$$\left|x_n\right| \to 0\text{ as } n \to \infty.$$

The rate of convergence in $(2)$ is quadratic and thus faster than in the contraction principle. There the convergence is exponential, here it is super-exponential. This plays an important role in applications, also to problems in pure mathematics (Nash embedding). If you have experience in programming, you could write a brief code which computes this Newton sequence for simple functions of your choice. You will see how rapidly the sequence stabilizes behind the comma.

score 3 · Answer 3 · answered May 08 '14 at 22:42

You look at the size of the next function value. For simple roots and close to the root, the function value is a measure for the distance to the root. $$ f(x+h)=f(x)+f'(x)h+\frac12 f''(x+\theta h)h^2 $$ Denote $L=\max_{x\in I} |f''(x)|$ and set $f(x)+f'(x)h=0$, then $$ |f(x+h)|\le \frac L2 h^2=\frac L2\frac{f(x)^2}{f'(x)^2} $$ Now put the first derivatives into the constant and return to the iteration sequence $(x_n)$ to get $$ |f(x_{n+1})|\le C\,|f(x_n)|^2 \iff |C\,f(x_{n+1}|\le|C\,f(x_n)|^2 $$ where $C=\frac{L}{2m^2}$ with $$ 0< m\le |f'(x)|\le M<\infty $$

Repeated squaring leads to a dyadic power in the exponent, so that $$ |C\,f(x_n)|\le|C\,f(x_0)|^{2^n} $$ This is what is meant with quadratic convergence, that the exponent is $2^n$ instead of $n$ as in linear convergence.

The condition to guarantee convergence is then $|C\,f(x_0)|<1$.

For the distance to the root $x_*$ use $$ f(x)=f(x)-f(x_*)\le f'(x_*+\theta(x-x_*))\,(x-x_*) $$ so that $$ m\,|x-x_*|\le |f(x)|\le M\,|x-x_*|\iff \frac{|f(x)|}M\le |x-x_*|\le\frac{|f(x)|}m. $$

"But first of all, I need to understand what quadratic convergence means" I'm sure he gets it now... — Steven Roose, Jan 25 '15 at 23:22
+1. I appreciate the extra multiplication by $C$ which greatly simplifies the convergence analysis. — Carl Christian, Feb 20 '19 at 13:22

Venkata Karthik Bandaru · Answer 4 · 2025-05-10T04:26:39.390

Ref: “Foundations of Applied Mathematics” by Humpherys, Jarvis, Evans.

Let ${ f : X \longrightarrow X }$ be a differentiable function. The goal is to find the zeroes of ${ f . }$

Note that the best linear approximation of ${ f }$ at ${ x _n }$ is

$${ L(x) = f(x _n) + Df(x _n) (x - x _n) . }$$

Note that assuming ${ Df(x _n) }$ is invertible, the linear approximation has a unique zero at

$${ x _n - Df(x _n) ^{-1} f(x _n) .}$$

This suggests

$${ x _{n+1} := x _n - Df(x _n) ^{-1} f(x _n) }$$

is a better approximation of a zero of ${ f }$ than ${ x _n . }$

Note that starting at any ${ x _0 }$ and repeating this process, we get a sequence ${ x _0, x _1, \ldots . }$

We will show that this sequence often (but not always) converges to a zero of ${ f . }$ Further, if ${ x _0 }$ is well chosen, the convergence can be very fast.

Let ${ f : [a, b] \longrightarrow \mathbb{R} . }$ Let ${ \overline{x} \in (a, b) }$ be a zero of ${ f . }$ We will show that ${ \phi(x) = x - \frac{f(x)}{f ^{’}(x)} }$ gives a contraction map near ${ \overline{x} . }$

Thm [Contraction map near a zero]:

Let ${ f : [a, b] \longrightarrow \mathbb{R} }$ be a ${ C ^2 }$ map. Let ${ \overline{x} \in (a, b) }$ be such that ${ f(\overline{x}) = 0 }$ and ${ f ^{’} (\overline{x}) \neq 0 . }$

Then there is a ${ \delta > 0 }$ such that

$${ \phi(x) := x - \frac{f(x)}{f ^{’} (x)} }$$

gives a contraction map on ${ [\overline{x} - \delta, \overline{x} + \delta] \subseteq [a, b] . }$

Pf: Note that ${ \phi(x) }$ is well defined on a neighbourhood of ${ \overline{x} . }$

Note that

$${ {\begin{aligned} &\, \vert \phi(x) - \phi(y) \vert \\ = &\, \vert \phi ^{’} (c) \vert \vert x - y \vert \\ = &\, \left\vert 1 - \frac{f ^{’} (c) ^2 - f(c) f ^{’’} (c) }{f ^{’} (c) ^2} \right\vert \vert x - y \vert \\ = &\, \left\vert \frac{f(c) f ^{’’} (c) }{f ^{’} (c) ^2} \right\vert \vert x - y \vert \end{aligned}} }$$

for some ${ c = c _{x, y} }$ between ${ x }$ and ${ y . }$

Hence consider the function

$${ \frac{f(x) f ^{’’} (x)}{f ^{’} (x) ^2 } . }$$

Note that

$${ \frac{f(x) f ^{’’} (x)}{f ^{’} (x) ^2 } \to 0 \quad \text{ as } x \to \overline{x} . }$$

Hence pick a ${ \delta > 0 }$ such that ${ [\overline{x} - \delta, \overline{x} + \delta] \subseteq [a, b] }$ and

$${ k := \sup _{x \in [\overline{x} - \delta, \overline{x} + \delta]} \left\vert \frac{f(x) f ^{’’} (x)}{f ^{’} (x) ^2 } \right\vert < 1 .}$$

Hence for any ${ [x, y] \subseteq [\overline{x} - \delta, \overline{x} + \delta] }$ we have

$${ \vert \phi(x) - \phi(y) \vert \leq k \vert x - y \vert }$$

where ${ k < 1 . }$

Note that ${ \phi }$ gives a map from ${ [\overline{x} - \delta, \overline{x} + \delta] }$ to itself. Hence ${ \phi }$ is a contraction map on ${ [\overline{x} - \delta, \overline{x} + \delta] , }$ as needed. ${ \blacksquare }$

Thm [Newton’s method]:

Let ${ f : [a, b] \longrightarrow \mathbb{R} }$ be a ${ C ^2 }$ map. Let ${ \overline{x} \in (a, b) }$ be such that ${ f(\overline{x}) = 0 }$ and ${ f ^{’} (\overline{x}) \neq 0 . }$

Then the iterative map

$${ x _{n+1} = x _n - \frac{f(x _n)}{f ^{’} (x _n)} }$$

converges quadratically to ${ \overline{x} ,}$ whenever ${ x _0 }$ is sufficiently close to ${ \overline{x} . }$

Pf: Note that ${ f ^{’} }$ is a ${ C ^1 }$ map. Hence ${ f ^{’} }$ is locally Lipschitz at ${ \overline{x} . }$ Hence there exists a ${ \delta _1 > 0 }$ and ${ L > 0 }$ such that

$${ \vert f ^{’} (x) - f ^{’} (y) \vert \leq L \vert x - y \vert }$$

whenever ${ x, y \in B(\overline{x}, \delta _1) . }$

Hence pick a ${ \delta < \delta _1 }$ as in the previous lemma. Pick any ${ x _0 \in [\overline{x} - \delta, \overline{x} + \delta] }$ and iterate. Note that by contraction mapping theorem, the sequence ${ (x _n) }$ converges to the unique fixed point ${ \overline{x} . }$

Hence we can study the rate of convergence of the sequence. Consider the errors

$${ \varepsilon _n := x _n - \overline{x} . }$$

Hence we can work in terms of ${ \overline{x} }$ and the errors ${ \varepsilon _n . }$

Note that

$${ f(\overline{x} + \varepsilon _{n-1}) = \underbrace{f(\overline{x})} _{0} + f ^{’} (\overline{x} + \theta _{n-1} \varepsilon _{n-1}) \varepsilon _{n-1} }$$

for some ${ \theta _{n-1} \in [0, 1] . }$

Hence

$${ {\begin{aligned} &\, \vert \varepsilon _n \vert \\ = &\, \left\vert \varepsilon _{n-1} - \frac{f(\overline{x} + \varepsilon _{n-1})}{f ^{’} (\overline{x} + \varepsilon _{n-1}) } \right\vert \\ = &\, \left\vert 1 - \frac{f ^{’} (\overline{x} + \theta _{n-1} \varepsilon _{n-1})}{f ^{’} (\overline{x} + \varepsilon _{n-1}) } \right\vert \vert \varepsilon _{n-1} \vert \\ \leq &\, \frac{L \vert \varepsilon _{n-1} - \theta _{n-1} \varepsilon _{n-1} \vert}{\vert f ^{’} (\overline{x} + \varepsilon _{n-1}) \vert} \vert \varepsilon _{n-1} \vert \\ \leq &\, M \vert \varepsilon _{n-1} \vert ^2 \end{aligned}} }$$

where

$${ M := \sup _{x \in [\overline{x} - \delta, \overline{x} + \delta]} \frac{L}{\vert f ^{’} (x) \vert} . }$$

Hence the sequence ${ (x _n) }$ converges quadratically to ${ \overline{x}, }$ as needed. ${ \blacksquare }$

Proof that Newton Raphson method has quadratic convergence

4 Answers4

Linked

Related