Ref: “Foundations of Applied Mathematics” by Humpherys, Jarvis, Evans.
Let ${ f : X \longrightarrow X }$ be a differentiable function. The goal is to find the zeroes of ${ f . }$
Note that the best linear approximation of ${ f }$ at ${ x _n }$ is
$${ L(x) = f(x _n) + Df(x _n) (x - x _n) . }$$
Note that assuming ${ Df(x _n) }$ is invertible, the linear approximation has a unique zero at
$${ x _n - Df(x _n) ^{-1} f(x _n) .}$$
This suggests
$${ x _{n+1} := x _n - Df(x _n) ^{-1} f(x _n) }$$
is a better approximation of a zero of ${ f }$ than ${ x _n . }$
Note that starting at any ${ x _0 }$ and repeating this process, we get a sequence ${ x _0, x _1, \ldots . }$
We will show that this sequence often (but not always) converges to a zero of ${ f . }$ Further, if ${ x _0 }$ is well chosen, the convergence can be very fast.
Let ${ f : [a, b] \longrightarrow \mathbb{R} . }$ Let ${ \overline{x} \in (a, b) }$ be a zero of ${ f . }$ We will show that ${ \phi(x) = x - \frac{f(x)}{f ^{’}(x)} }$ gives a contraction map near ${ \overline{x} . }$
Thm [Contraction map near a zero]:
Let ${ f : [a, b] \longrightarrow \mathbb{R} }$ be a ${ C ^2 }$ map. Let ${ \overline{x} \in (a, b) }$ be such that ${ f(\overline{x}) = 0 }$ and ${ f ^{’} (\overline{x}) \neq 0 . }$
Then there is a ${ \delta > 0 }$ such that
$${ \phi(x) := x - \frac{f(x)}{f ^{’} (x)} }$$
gives a contraction map on ${ [\overline{x} - \delta, \overline{x} + \delta] \subseteq [a, b] . }$
Pf: Note that ${ \phi(x) }$ is well defined on a neighbourhood of ${ \overline{x} . }$
Note that
$${ {\begin{aligned} &\, \vert \phi(x) - \phi(y) \vert \\ = &\, \vert \phi ^{’} (c) \vert \vert x - y \vert \\ = &\, \left\vert 1 - \frac{f ^{’} (c) ^2 - f(c) f ^{’’} (c) }{f ^{’} (c) ^2} \right\vert \vert x - y \vert \\ = &\, \left\vert \frac{f(c) f ^{’’} (c) }{f ^{’} (c) ^2} \right\vert \vert x - y \vert \end{aligned}} }$$
for some ${ c = c _{x, y} }$ between ${ x }$ and ${ y . }$
Hence consider the function
$${ \frac{f(x) f ^{’’} (x)}{f ^{’} (x) ^2 } . }$$
Note that
$${ \frac{f(x) f ^{’’} (x)}{f ^{’} (x) ^2 } \to 0 \quad \text{ as } x \to \overline{x} . }$$
Hence pick a ${ \delta > 0 }$ such that ${ [\overline{x} - \delta, \overline{x} + \delta] \subseteq [a, b] }$ and
$${ k := \sup _{x \in [\overline{x} - \delta, \overline{x} + \delta]} \left\vert \frac{f(x) f ^{’’} (x)}{f ^{’} (x) ^2 } \right\vert < 1 .}$$
Hence for any ${ [x, y] \subseteq [\overline{x} - \delta, \overline{x} + \delta] }$ we have
$${ \vert \phi(x) - \phi(y) \vert \leq k \vert x - y \vert }$$
where ${ k < 1 . }$
Note that ${ \phi }$ gives a map from ${ [\overline{x} - \delta, \overline{x} + \delta] }$ to itself. Hence ${ \phi }$ is a contraction map on ${ [\overline{x} - \delta, \overline{x} + \delta] , }$ as needed. ${ \blacksquare }$
Thm [Newton’s method]:
Let ${ f : [a, b] \longrightarrow \mathbb{R} }$ be a ${ C ^2 }$ map. Let ${ \overline{x} \in (a, b) }$ be such that ${ f(\overline{x}) = 0 }$ and ${ f ^{’} (\overline{x}) \neq 0 . }$
Then the iterative map
$${ x _{n+1} = x _n - \frac{f(x _n)}{f ^{’} (x _n)} }$$
converges quadratically to ${ \overline{x} ,}$ whenever ${ x _0 }$ is sufficiently close to ${ \overline{x} . }$
Pf: Note that ${ f ^{’} }$ is a ${ C ^1 }$ map. Hence ${ f ^{’} }$ is locally Lipschitz at ${ \overline{x} . }$ Hence there exists a ${ \delta _1 > 0 }$ and ${ L > 0 }$ such that
$${ \vert f ^{’} (x) - f ^{’} (y) \vert \leq L \vert x - y \vert }$$
whenever ${ x, y \in B(\overline{x}, \delta _1) . }$
Hence pick a ${ \delta < \delta _1 }$ as in the previous lemma. Pick any ${ x _0 \in [\overline{x} - \delta, \overline{x} + \delta] }$ and iterate. Note that by contraction mapping theorem, the sequence ${ (x _n) }$ converges to the unique fixed point ${ \overline{x} . }$
Hence we can study the rate of convergence of the sequence. Consider the errors
$${ \varepsilon _n := x _n - \overline{x} . }$$
Hence we can work in terms of ${ \overline{x} }$ and the errors ${ \varepsilon _n . }$
Note that
$${ f(\overline{x} + \varepsilon _{n-1}) = \underbrace{f(\overline{x})} _{0} + f ^{’} (\overline{x} + \theta _{n-1} \varepsilon _{n-1}) \varepsilon _{n-1} }$$
for some ${ \theta _{n-1} \in [0, 1] . }$
Hence
$${ {\begin{aligned} &\, \vert \varepsilon _n \vert \\ = &\, \left\vert \varepsilon _{n-1} - \frac{f(\overline{x} + \varepsilon _{n-1})}{f ^{’} (\overline{x} + \varepsilon _{n-1}) } \right\vert \\ = &\, \left\vert 1 - \frac{f ^{’} (\overline{x} + \theta _{n-1} \varepsilon _{n-1})}{f ^{’} (\overline{x} + \varepsilon _{n-1}) } \right\vert \vert \varepsilon _{n-1} \vert \\ \leq &\, \frac{L \vert \varepsilon _{n-1} - \theta _{n-1} \varepsilon _{n-1} \vert}{\vert f ^{’} (\overline{x} + \varepsilon _{n-1}) \vert} \vert \varepsilon _{n-1} \vert \\ \leq &\, M \vert \varepsilon _{n-1} \vert ^2 \end{aligned}} }$$
where
$${ M := \sup _{x \in [\overline{x} - \delta, \overline{x} + \delta]} \frac{L}{\vert f ^{’} (x) \vert} . }$$
Hence the sequence ${ (x _n) }$ converges quadratically to ${ \overline{x}, }$ as needed. ${ \blacksquare }$