4

I am trying to prove the strong differentiability version of the Inverse Function Theorem for Banach spaces, but I am not sure if it is true. I am interested in this because it is a kind of punctual version of the theorem. So my main question is:

Is the strong differentiability version of the Inverse Function Theorem true for Banach spaces?

Here is the definition of strong differentiability.

Definition Let $E$ and $E'$ be normed linear spaces, $A \subseteq E$ an open set, $a \in A$ a point and $f: A \to E'$ a function. We say $f$ is strong differentiable at $a$ when there is a continuous linear map $D: E \to E'$ such that $$\lim_{(x,x') \to (0,0)} \frac{f(a+x')-f(a+x) - D(x'-x)}{|x'-x|} = 0.$$

In this case, $f$ is differentiable at $a$ and $D = Df|_a$, that is, the linear map $D$ is the differential of $f$ at $a$.

Considerer the remainder function $r_a(v) = f(a+v) - f(a) - Df|_a(v)$. In finite dimensional spaces, strong differentiability at $a$ can be shown to be equivalent to this: for every $\varepsilon > 0$, there is a neighborhood of the origin in which the function $r_a$ is Lipschitz with Lipscitz constant $\varepsilon$. I believe this is also true for infinite dimensions, but have not proved it yet.

Inverse Function Theorem (strongly differentiable) Let $E$ and $E'$ be Banach spaces, $A \subseteq E$ an open set, $a \in A$ a point and $f: A \to E'$ a function which is strongly differentiable at $a$ and such that $Df|_a:E \to E'$ is a linear isomorphism. In this case, there is an open neighborhood $V \subseteq A$ of $a$ such that $f|_V: V \to f(V)$ is a homeomorphism, the inverse function $f^{-1}: f(V) \to V$ is strongly differentiable at $f(a)$ and its differentiable at $f(a)$ is $Df^{-1}|_{f(a)} = (Df|_{a})^{-1}$.

Ted Shifrin
  • 125,228
  • I took the liberty of editing your definition of differentiability because I believe it contained a few typos. I've modified it so that it matches the definition of Frechet derivative which is the strongest sensible concept of differentiation, as far as I know, and it's the one that most directly generalises the derivative from finite dimensional calculus. I've written my answer based on this. – FShrike Aug 10 '23 at 20:50
  • @FShrike You seem to have changed the definition substantially. Now this is the usual definition, not having two different points involved. Yes, there was a typo, but you’ve completely changed the question. You’ve removed actual strong from the definition. – Ted Shifrin Aug 10 '23 at 20:53
  • @TedShifrin Oh dear. Would I be right in thinking their intended definitions sits somewhere between Frechet differentiability and continuous differentiability in terms of strength? – FShrike Aug 10 '23 at 20:59
  • 1
    I’ve changed it back and corrected the obvious typos. Your answer may need serious repairs. Google shows this is not a novel definition, although I had not seen it before. I would guess that your guess is right. – Ted Shifrin Aug 10 '23 at 21:04
  • @TedShifrin I believe their definition of strong derivative agrees with all other definitions of derivative in the finite dimensional case but I could be wrong (the OP's comment about Lipschitz constants would support this claim). Do you know if this result is even true for the infinite dimensional cases? – FShrike Aug 10 '23 at 21:08
  • This post and this post may be helpful/relevant. – Ted Shifrin Aug 10 '23 at 21:15
  • 1
    Dieudonné gives an exercise to prove that, assuming that $f$ is differentiable, strong differentiability at $a$ is equivalent to $f$'s being continuously differentiable at $a$. @FShrike – Ted Shifrin Aug 10 '23 at 21:22
  • @TedShifrin That's good news if it's true, thanks. – FShrike Aug 10 '23 at 21:23
  • @TedShifrin This has been answered on MO. The OP annoyingly cross-posted without indicating – FShrike Aug 10 '23 at 21:33
  • 1
    @FShrike Annoyingly, with the same nonsensical typos in the definition. – Ted Shifrin Aug 10 '23 at 22:32

3 Answers3

2

The "strong differentiability" version of the Inverse Function Theorem works for any Banach space, because the pointwise condition suffices to show the local Lipschitz property required:

As usual, one can reduce to the case of a function $f$ defined on an open set $U$ in a Banach space $X$, where $0\in U$ and $f(0)=0$, and $Df_0 = I_X$ the identity map.

The strategy of the proof of the Inverse Function Theorem is to show that a "small perturbation" of the identity map remains invertible. In this context a "small perturbation" is taken to be a contraction mapping (a Lipschitz map with Lipschitz constant less than 1) and we only get a local inverse to $f$ because the hypotheses only ensure that $f$ is a small perturbation of the identity near $0$.

Thus the proof of Inverse Function Theorem relies on establishing that $f=I_X+\varphi$ where $\varphi$ is a contraction map near $0$. If the derivative $Df$ is continuous at $0$, then this can be established by the Mean Value Inequality. If one instead assumes strong differentiability at $0$ (and $Df_0 = I_X$) then $$ \frac{\|f(x)-f(y)-I_X(x-y)\|}{\|x-y\|}\to 0\quad \text{ as } \quad x,y\to 0. \tag{$\dagger$} $$

Hence if we set $\varphi(x)=f(x)-x$ so that $f(x)= I_X + \varphi$, then $(\dagger)$ can be rephrased as $\|\varphi(x)-\varphi(y)\|/\|x-y\|\to 0$ as $x,y\to 0$. Thus there is an $r>0$ such that for all $x,y \in B(0,r)$ we have $\|\varphi(x)-\varphi(y)\|\leq (1/2)\|x-y\|$, establishing the Lipschitz condition.

The standard proof therefore applies to any Banach space, whatever its dimension.

krm2233
  • 7,230
  • It is important to point out that Strong Differentiability and Continuously Differentiable are equivalent concepts in normed spaces (that CD implies SD follows from MVT and that SD implies CD follows by considering Gateux derivatives and the usual operator norm). – William M. Jun 25 '24 at 20:03
  • @WilliamM: you need to be a little bit careful about that assertion -- to say $f$ is continuously differentiable at $a \in U$ requires $f$ to be differentiable near $a$, thus, unlike strong differentiability, it is not a pointwise property. What you mean to say is that if a function is strongly differentiable at every point in an open set $U$ then it is continuously differentiable on $U$ and conversely if $f$ is continuously differentiable on $U$ then it is strongly differentiable on $U$. – krm2233 Jun 26 '24 at 15:30
  • Yes, correct. SD can exists locally, CD only makes sense if the function is differentiable on a set and we ask about continuity on a point of that set. – William M. Jun 26 '24 at 15:52
1

$\newcommand{\d}{\mathrm{d}}\newcommand{\id}{\mathrm{Id}}\newcommand{\L}{\mathscr{L}}\newcommand{\M}{\mathcal{M}}\newcommand{\con}{\operatorname{Con}}\newcommand{\I}{\mathcal{I}}$For context, I had written this whole thing up when I thought the OP was asking about a theorem involving Frechet derivatives, not this different "strong" derivative concept. I then deleted it since it answered the wrong question but I've now realised that the proof of one is readily adapted to a proof of the other. So that it wasn't a complete waste of time, I'm going to remark about the adaptations and un-delete this. I realise that this is now a very verbose version of krm2233's answer.

I write $\L$ for the space of continuous linear maps (given its "strong" operator-norm topology) and $\L_\cong\subset\L$ for the subset of linear isomorphisms. I write $\d_\bullet$ for the derivative rather than $D$.


Theorem $1$:

Say $X$ and $Y$ are Banach spaces, $\Omega\subseteq X$ is a nonempty open set and $f:\Omega\to Y$ is a function whose derivative $\d_f:\Omega\to\L(X;Y)$ exists and is continuous at $x_0\in\Omega$. Suppose further that $\d_f(x_0)$ is invertible.

Then there is an open neighbourhood $U\subseteq\Omega$ of $x_0$ and an open neighbourhood $V$ of $f(x_0)$ such that $f:U\to V$ is a whose inverse $g:V\to U$ is differentiable with $\d_g(y)=\d_f(g(y))^{-1}$ for all $y\in V$ ($\d_g$ is necessarily continuous at $f(x_0)$ but not necessarily anywhere else).

Moreover, if $f$ is $C^k$ (on $\Omega$) for $k\in\Bbb N\cup\{\infty\}$ then so is $g$.

Theorem $2$:

Say $X$ and $Y$ are Banach spaces, $\Omega\subseteq X$ is a nonempty open set and $f:\Omega\to Y$ is a function which admits a strong derivative at $x_0\in\Omega$, $\d_f(x_0)$, which is invertible.

Then there is an open neighbourhood $U\subseteq\Omega$ of $x_0$ and an open neighbourhood $V$ of $f(x_0)$ such that $f:U\to V$ is a homeomorphism whose inverse $g:V\to U$ is strongly differentiable at $f(x_0)$ with $\d_g(f(x_0))=\d_f(x_0)^{-1}$.

Moreover, if $f$ is (strongly) differentiable everywhere on $\Omega$, then so is $g$. If $f$ is strongly $C^k$ (i.e. its $k$th derivative exists everywhere, is continuous and is strong) on $\Omega$ for $k\in\Bbb N\cup\{\infty\}$ then so is $g$ (on $V$).

It's worth noting that $\d_f$ does not need to be everywhere continuous even though this is how the theorem is usually stated. The theorem can be even further weakened for the finite dimensional case but according to Tao the proof crucially relies on local compactness, which fails in infinite dimensions, so I'm guessing this weakening is false for general Banach spaces.

My proof is inspired by ideas from this set of notes, which maybe will have a more comprehensible treatment of the finite dimensional case. I am told the proof idea is pretty standard but I wanted to work out the details for myself.


The idea is that if $y$ is close to $f(x_0)$ and $y=f(x')$ for some $x'$, then for $x$ close to $x_0$ we should have: $$y=f(x)+f(x')-f(x)\approx f(x)+\d_f(x)(x'-x)$$So we should be able to estimate: $$x'\approx\d_f(x)^{-1}(y-f(x))+x$$But the continuity of $\d_f$ at $x_0$ suggest we can approximate $\d_f(x)$ with $\d_f(x_0)$: $$x'\approx x+\d_f(x_0)^{-1}(y-f(x))$$Which lends itself to the study of the fixed point iteration: $$x\mapsto x+\d_f(x_0)^{-1}(y-f(x))$$

To keep the notation clean it's nice to make some simplifications first, though. Since $\d_f(x_0)$ is a diffeomorphism $X\cong Y$, the theorem's conclusion holds for $f$ iff. it holds for $\d_f(x_0)^{-1}\circ f:X\to X$, whose derivative at $x_0$ is the identity. Since translations are also diffeomorphisms, we may adjust things so that $x_0=0$ and $f(x_0)=0$.

So - without loss of generality, assume $Y=X$, $0\in\Omega$, $f(0)=0$ and that $\d_f(x_0)=\id$.

This is a proof of theorem $1$. Remarks to switch to a proof of theorem $2$ will appear.

The linked notes use a bound obtained via integration which is fair enough in the finite dimensional case but we can obtain similar bounds without needing this tool (and on weaker hypotheses, since the derivative of our $f$ may fail to be integrable). We can obtain the same bound without integration. The full power of this lemma is not required either but I'd never seen the general version before so I'll fill in the details here.

Theorem (general mean value inequality):

If $X$ is a normed linear space and $F:[0,1]\to X$ is everywhere continuous and is also differentiable on $(0,1)$ then: $$\|F(1)-F(0)\|\le\sup_{0<t<1}\|F'(t)\|$$Where $F'(\bullet):=\d_F(\bullet)(1)$ under the identification $X\cong\L(\Bbb R;X)$.

Proof $1$ (real induction):

Note that there is nothing to prove if $C:=\sup_{0<t<1}\|F'(t)\|=+\infty$, so assume $C$ is finite. Fix $\epsilon>0$.

Put $S:=\{t\in[0,1]:\|F(t)-F(0)\|\le(\epsilon+C)\cdot t\}$. The result follows if $S=[0,1]$ is shown. Note that $0\in S$ is trivially true. Since $[0,1]\ni t\mapsto\|F(t)-F(0)\|-(\epsilon+C)\cdot t\in\Bbb R$ is continuous we conclude $S$ is closed. In particular, if $t_0\in[0,1]$ has that $t\in S$ for all $0\le t<t_0$ then $t_0\in S$ follows. It remains to show that $t\in S$ implies the existence of $\delta>0$ such that $[t,t+\delta)\cap[0,1]\subseteq S$. Then, if $S\neq[0,1]$, we could deduce $\inf\,\,[0,1]\setminus S\in S$ and obtain a contradiction, so it would follow $S=[0,1]$.

So, say $t\in S$. If $t=0$, then note that continuity of $F$ at zero implies the existence of $\delta>0$ such that $0<t'<\delta$ implies $\|F(t')-F(0)\|<\epsilon$ from which is easily follows that for any such $\delta$ we have $[0,\delta)\cap[0,1]\subseteq S$. Suppose $0<t<1$ (if $t=1$ the claim is also trivial). Define $\epsilon'=\epsilon+C-\|F'(t)\|$. By definition of derivative, there is a $\delta>0$ such that if $t'\in(t-\delta,t+\delta)\cap S$ we have $\|F(t')-F(t)-F'(t)\cdot(t'-t)\|\le\epsilon'\cdot|t'-t|$, from which it follows $\|F(t')-F(t)\|\le(\|F'(t)\|+\epsilon')\cdot|t'-t|=(\epsilon+C)\cdot|t'-t|$. Thus if $t'\in[t,t+\delta)\cap[0,1]$ (so that $|t'-t|=t'-t$) we get: $$\begin{align}\|F(t')-F(0)\|&\le\|F(t')-F(t)\|+\|F(t)-F(0)\|\\&\le(\epsilon+C)\cdot(t'-t+t)\\&=(\epsilon+C)\cdot t'\end{align}$$Therefore $[t,t+\delta)\cap[0,1]\subseteq S$ as desired. $\blacksquare$

Proof $2$: fix any $\psi\in X^\ast$. By the general chain rule, if $G:[0,1]\to\Bbb R$ is given by $t\mapsto\psi(F(t))$ then $G$ is differentiable on $(0,1)$ with $G'\equiv\psi(F')$. The usual mean value inequality finds that $|G(1)-G(0)|\le\sup_{0<t<1}|\psi(F'(t))|\le\|\psi\|\cdot\sup_{0<t<1}\|F'(t)\|$. That is, $|\psi(F(1)-F(0))|\le\|\psi\|\cdot C$. Since the canonical map $X\hookrightarrow X^{\ast\ast}$ is an isometry and we have just shown that the image of $F(1)-F(0)$ has $X^{\ast\ast}$-norm no greater than $C$, it follows $\|F(1)-F(0)\|\le C$ as desired.

Back to the situation at hand:

By the continuity of $\d_f$ at zero (and openness of $\Omega$ at zero) I know there is some $\delta>0$ such that $x\in\Omega$ and $\|\id-\d_f(x)\|\le\frac{1}{2}$ for all $x\in X$ with $\|x\|\le\delta$. Put $K$ to be the closed ball of radius $\delta$ about the origin. I claim that: $$T:K\to K,\,x\mapsto x-f(x)$$Is (a) well-defined and (b) a contraction with $\|T(x')-T(x)\|\le\frac{1}{2}\|x'-x\|$ for all $x,x'\in K$. Since $T(0)=0$ it follows from point (b) that $T(K)$ is contained in the closed ball of radius $\delta/2$; combining this with the fact $K\subseteq\Omega$ makes (a) follow automatically.

Fix $x,x'\in K$. Put $F:[0,1]\to X$ via $t\mapsto T(x+t(x'-x))$. This is well defined since $K$ is convex. Moreover it is evidently continuous and it is differentiable on $(0,1)$ with "derivative" $F'(t)=(\id-\d_f(x+t(x'-x)))(x'-x)$ for all $0<t<1$; hence $\sup_{0<t<1}\|F'(t)\|\le\frac{1}{2}\|x'-x\|$ follows. Oh, but by our mean value inequality we know that means: $$\|T(x')-T(x)\|=\|F(1)-F(0)\|\le\frac{1}{2}\|x'-x\|$$As desired.

Define $V\subseteq\Omega$ as the open ball of radius $\delta/2$. This is an open neighbourhood of $0=f(0)$. Now for another theorem:

Theorem (parametric contraction mapping principle):

If $(\M;\rho)$ is a nonempty metric space, there is a topological space $\con(\M)$ (sorry, set theorists) consisting of all strong contractions on $T$, topologised by the base of sets of the form $\{T'\in\con(\M):\sup_{x\in\M}\rho(T'(x),T(x))<\epsilon\}$ for $\epsilon>0$ and $T\in\con(\M)$.

If $\Lambda$ is any topological space, $\Gamma:\Lambda\to\con(\M)$ any continuous function and if $\M$ is complete then the induced map $\Gamma^\ast:\Lambda\to\M$ which takes $\lambda$ to the unique fixed point of $\Gamma(\lambda)$ is (a) well-defined and (b) continuous. Moreover, if $\Lambda$ is a metric space and $\Gamma$ is "Lipschitz" and there is $c<1$ such that $c$ serves as a Lipschitz/contraction constant for every $\Gamma(\lambda),\,\lambda\in\Lambda$ then $\Gamma^\ast$ is also Lipschitz.

Proof:

The usual contraction mapping principle proves claim (a) and further proves that $\Gamma^\ast(\lambda)=\lim_{n\to\infty}\Gamma(\lambda)^n(x)$ for any $x\in\M,\,\lambda\in\Lambda$. Fix $\lambda,\lambda'\in\Lambda$ with the property that $\epsilon:=\sup_{x\in\M}\rho(\Gamma(\lambda)(x),\Gamma(\lambda')(x))$ is finite.

There is $x_0\in\M$. Let $c$ be a contraction constant for $\Gamma(\lambda)$. Define for $n\in\Bbb N_0$ the quantity $\alpha_n:=\rho(\Gamma(\lambda)^n(x_0),\Gamma(\lambda')^n(x_0)$. We know $\lim_{n\to\infty}\alpha_n=\rho(\Gamma^\ast(\lambda),\Gamma^\ast(\lambda'))$.

I claim $\alpha_n\le\epsilon\cdot\sum_{j=0}^{n-1}c^j$ for all $n$. This is trivially true for $n=0$. If it is true for some $n\in\Bbb N_0$ then consider: $$\begin{align}\alpha_{n+1}&=\rho(\Gamma(\lambda)(\Gamma(\lambda)^n(x_0)),\Gamma(\lambda')(\Gamma(\lambda')^n(x_0))\\&\le\rho(\Gamma(\lambda)(\Gamma(\lambda')^n(x_0)),\Gamma(\lambda')(\Gamma(\lambda')^n(x_0)))\\&\quad\quad+\rho(\Gamma(\lambda)(\Gamma(\lambda)^n(x_0)),\Gamma(\lambda)(\Gamma(\lambda')^n(x_0)))\\&\le\epsilon+c\cdot\rho(\Gamma(\lambda)^n(x_0),\Gamma(\lambda')^n(x_0))\\&=\epsilon+\alpha_n\\&\le\epsilon\cdot\sum_{j=0}^nc^j\end{align}$$Thus by induction the claim is always true. Since $c<1$ we infer: $$\rho(\Gamma^{\ast}(\lambda),\Gamma^{\ast}(\lambda'))\le\frac{\epsilon}{1-c}$$

This bound automatically implies continuity of $\Gamma^{\ast}$ given continuity of $\Gamma$. Moreover if $\Lambda$ is a metric space with metric $\rho^\ast$ and there is a global constant $K>0$ with $\sup_{x\in\M}\rho(\Gamma(\lambda)(x),\Gamma(\lambda')(x))\le K\rho^\ast(\lambda,\lambda')$ for all $\lambda,\lambda'\in\Lambda$ and if there is a global $c<1$ which is a contraction constant for each $\Gamma(\lambda),\,\lambda\in\Lambda$ then we infer that for all $\lambda,\lambda'\in\Lambda$: $$\rho(\Gamma^\ast(\lambda),\Gamma^\ast(\lambda'))\le\frac{K}{1-c}\cdot\rho^\ast(\lambda,\lambda')$$So $\Gamma^\ast$ is Lipschitz, as claimed.

With that out of the way, define $\Gamma:V\to\con(K)$ via $y\mapsto(x\mapsto y+T(x))$. This is well defined because $T(K)$ is contained in the closed ball of radius $\delta/2$ so $y+T(x)$ is always in $V+\frac{1}{2}K\subseteq K$ (and because each $\Gamma(y)$ is a contraction, see below). Observe that $\Gamma(y)(x)\equiv x+y-f(x)$ which is the (analogue to the) mapping mentioned in the motivation section. Since $\|\Gamma(y)(x)-\Gamma(y')(x)\|\equiv\|y-y'\|$, $\Gamma$ is obviously "Lipschitz" (with constant $1$) and since $\|\Gamma(y)(x)-\Gamma(y)(x')\|\equiv\|T(x)-T(x')\|\le\frac{1}{2}\|x-x'\|$ we can choose $c=1/2$ as our global contraction constant. It follows from the above theorem that $\Gamma^{\ast}:V\to K$ is Lipschitz (with constant $2$).

Write $g:=\Gamma^\ast$ for convenience and $K'$ for the open ball of radius $\delta$. Since $g(0)=0$, the Lipschitz-constant-$2$ observation means $g(V)\subseteq K'$. Note that $f\circ g=\id_V$, very importantly, just by checking the fixed point condition. $U:=f^{-1}(V)\cap K'$ is an open subset of $X$, containing $0$ and it follows that $U=g(V)$ and that $f(V)=U$. Therefore we have established that $f$ restricts to a homeomorphism, with Lipschitz inverse, between the open sets $U$ and $V$.


To handle theorem $2$, we do not have a continuous derivative or even a derivative at all (not necessarily, anyway) on $\Omega$. However, we can choose $\delta>0$ be so that $K\subseteq\Omega$ holds again and if $x,x'\in K$ then: $$\frac{\|f(x')-f(x)-(x'-x)\|}{\|x'-x\|}\le\frac{1}{2}$$From which it is very obvious that $T$ is a contraction with contraction constant $1/2$, and the exact same definitions of $U$, of $V$, of $\Gamma$ all work. Observe that while continuity of $f$ was not a hypothesis in theorem $2$, $f$ must be continuous on $U$ since $f=\id-T$ is the difference of two continuous maps ($T$, as a contraction, is forced to be continuous). Then we have established $f$ to restrict to a homeomorphism of open neighbourhoods of zero, $U\cong V$. It remains to show that $\d_f(0)^{-1}$ serves as a strong derivative for $g$ at $0$ and it remains to show the stuff about $C^k$. Note however that if $f$ is $C^k$ on $\Omega$ then the derivatives must be strong (at all points of $\Omega$) since the mean value inequality would be available and the same technique used to show $T$ is a contraction (in the proof of theorem $1$) would show us what we want. Then $g$ will be $C^k$ by theorem $1$ and $g$'s derivatives will all be strong, so, those last conclusions will actually follow without further effort.

Resume the proof of theorem $1$:


We still need to check that $g$ is differentiable. From the bound $\|\id-\d_f(x)\|\le\frac{1}{2}$ for $x\in K$ we deduce $\d_f(x)$ is invertible, thus a homeomorphism by the open mapping theorem, for all $x\in K$, in particular for all $x\in U$. Then $\d_g:V\to\L(X;X)$, $y\mapsto\d_f(g(y))^{-1}$ is well-defined. We just need to check it serves as a genuine derivative for $g$.

Fix $y\in V$ and write $x=g(y)$. In what is about to follow, we make a change of variables $x'\sim y'$ where $x'\in U$ and $f(x')=y'$, $g(y')=x'$ and this will be valid since $g,f$ are homeomorphisms. It's also going to be important that $\d_f(x)$ is a homeomorphism. Understand all limit equations as: the left hand side exists and equals [-] therefore the right hand side exists and equals [-].

$$\begin{align}0&=\d_f(x)^{-1}(0)\\&=\d_f(x)^{-1}\left(\lim_{x'\to x}\frac{\d_f(x)(x'-x)-(f(x')-f(x))}{\|x'-x\|}\right)\\&=\d_f(x)^{-1}\left(\lim_{y'\to y}\frac{\d_f(x)(g(y')-g(y))-(y'-y)}{\|g(y')-g(y)\|}\right)\\&=\lim_{y'\to y}\frac{g(y')-g(y)-\d_f(g(y))^{-1}(y'-y)}{\|y'-y\|}\cdot\underset{\ge1/2}{\underbrace{\frac{\|y'-y\|}{\|g(y')-g(y)\|}}}\end{align}$$Implying that: $$\lim_{y'\to y}\frac{g(y')-g(y)-\d_f(g(y))^{-1}(y'-y)}{\|y'-y\|}=0$$As desired. Therefore $g:V\to U$ is differentiable in the expected way. $\blacksquare$


Back to theorem $2$: observe that the same limit manipulations will go through (taking $x=x_0=0$) since $g\times g$ and $f\times f$ shall be homeomorphisms $U\times U\cong V\times V$. $g$ will still be shown to be Lipschitz, with constant $2$, so the last part about bounding the quotient $\|y'-y\|/\|g(y')-g(y)\|$ carries over. Then the proof is more or less identical so long as we switch to double limit notation: $$\begin{align}0&=\d_f(0)^{-1}(0)\\&=\d_f(0)^{-1}\left(\lim_{(x',x)\to(0,0)}\frac{\d_f(0)(x'-x)-(f(x')-f(x))}{\|x'-x\|}\right)\\&=\d_f(0)^{-1}\left(\lim_{(y',y)\to(0,0)}\frac{\d_f(0)(g(y')-g(y))-(y'-y)}{\|g(y')-g(y)\|}\right)\\&=\lim_{(y',y)\to(0,0)}\frac{g(y')-g(y)-\d_f(0)^{-1}(y'-y)}{\|y'-y\|}\cdot\underset{\ge1/2}{\underbrace{\frac{\|y'-y\|}{\|g(y')-g(y)\|}}}\end{align}$$Proving $\d_f(0)^{-1}$ to be a strong derivative for $g$ at zero.


The last thing to do is check the statement about $C^k$. Denote by $\I$ the inversion: $\I:\L_\cong(X;Y)\to\L_\cong(Y;X)$. This is always a smooth map ($\L_\cong(X;Y)$ being an open subset of $\L(X;Y))$. We may calculate: $$\forall F\in\L_\cong(X;Y),\,\d_\I(F)=(\L(X;Y)\ni H\mapsto-F^{-1}\circ H\circ F^{-1}\in\L(Y;X))$$Proving $\I$ is at least continuously differentiable.

We have: $$\d_g:V\overset{g}{\longrightarrow}U\overset{\d_f}{\longrightarrow}\L_\cong(X;Y)\overset{\I}{\longrightarrow}\L(Y;X)$$If $f$ is $C^1$ then every arrow in that diagram is continuous, hence $\d_g$ is continuous so $g$ is $C^1$. If $f$ is $C^2$ then $\d_f$ is $C^1$, $g$ is $C^1$ and $\I$ is $C^1$ so it would follow $\d_g$ is $C^1$, so that $g$ is $C^2$ as well, and the claim $g\in C^k$ if $f\in C^k$ for $k=1,2,\cdots,\infty$ follows by this induction (given that $\I$ is smooth and that the property of being of class $C^k$ is stable under composition).

For Banach spaces $U,V,W$, write $c_\ast:\L(V;W)\to\L(\L(U;V);\L(U;W))$ for the forward "currying" morphism that takes $T:V\to W$ to the map $(H:U\to V)\mapsto(TH:U\to W)$ and similarly write $c^\ast:\L(U;V)\to\L(\L(V;W);\L(U;W))$ for the precomposition "currying" map. Notice that $c^\ast$ and $c_\ast$ are always continuous linear transformations and are equal to their own derivative.

We can then express: $$\d_\I:\L_\cong(X;Y)\overset{\I}{\longrightarrow}\L(Y;X)\overset{(-1)\oplus1}{\longrightarrow}\L(Y;X)\oplus\L(Y;X)\\\overset{c_\ast\oplus c^\ast}{\longrightarrow}\L(\L(Y;Y);\L(Y;X))\oplus\L(\L(X;Y);\L(Y;Y))\overset{\circ}{\longrightarrow}\L(\L(X;Y);\L(Y;X))$$

If we also write $\circ:\L(V;W)\oplus\L(U;V)\to\L(U;W)$ for the map that takes $(F,G)\mapsto FG$. It would then follow by the same induction argument that if $\circ$ is smooth, so is $\I$ (notice that the middle arrows are linear maps hence smooth).

It's easy to prove that $\circ$ is everywhere continuously differentiable with derivative $\d_\circ(F,G)=c^\ast(G)\oplus c_\ast(F):\L(V;W)\oplus\L(U;V)\to\L(U;W)$ for all $F,G$.

Exercise: use the formal chain rule and this observation to show that $\I$ is $C^2$ with: $$\d^2_\I(F)(G)\equiv(\L(X;Y)\ni T\mapsto F^{-1}GF^{-1}TF^{-1}+F^{-1}TF^{-1}GF^{-1}\in\L(Y;X))$$

Now observe that $\d_\circ$ is itself a continuous linear transformation. Thus $\d^2_\circ$ exists and is constant, and $\d^n_\circ$ is the zero map for all $n\ge 3$. $\circ$ is thus smooth, completing the proof (so long as we expect being of class $C^k$ to be stable under composition).

To see being of class $C^k$ is stable under composition, we induct on $k$. With $k=1$ this is fairly obvious from the chain rule. Let's fix $U\overset{f}{\longrightarrow}V\overset{g}{\longrightarrow}W$ which are both of class $C^k$, $k\ge1$. We can express: $$\d_{gf}:U\overset{\Delta}{\longrightarrow}U\oplus U\overset{f\oplus\d_f}{\longrightarrow}V\oplus\L(U;V)\overset{\d_g\oplus1}{\longrightarrow}\L(V;W)\oplus\L(U;V)\overset{\circ}{\longrightarrow}\L(U;W)$$The first and last arrows in the diagram are smooth. Since it's easy to see that, if $h$ is of class $C^m$, so is $1\oplus h$, and because $f,\d_f,\d_g$ are by hypothesis all of class (at least) $C^{k-1}$, we find $\d_{gf}$ a composite of class $C^{k-1}$ maps hence, by induction, itself of class $C^{k-1}$. Therefore $gf$ is of class $C^k$. Done!


If $h:U\to V$ is $C^k$ then $1\oplus h:X\oplus U\to X\oplus V$ is $C^k$ with every $n$th derivative, $1\le n\le k$, equal to the map: $$\d_{1\oplus h}^n:X\oplus U\to\L(X\oplus U;-)^n(X\oplus V)$$Defined by: $$\begin{align}\d_{1\oplus h}(x,u)(x_1,u_1)&=(x_1,\d_h(u)(u_1))\\\d_{1\oplus h}^n(x,u)(x_n,u_n)\cdots(x_1,u_1)&=(0,\d_h^n(u)(u_n)\cdots(u_1)),\quad\quad n>1\end{align}$$

FShrike
  • 46,840
  • 3
  • 35
  • 94
0

Just for the sake of completeness. All the following theorems can be proved and be called "Inverse function theorem" (IFT). Assume $\mathrm{A}$ and $\mathrm{B}$ are open sets of Banach spaces and $f:\mathrm{A} \to \mathrm{B}.$

Falsehood of IFT for differentiable functions. Consider $f(x) = x^2 \sin (1/x) + \frac{x}{2}$ for $x \neq 0$ and $f(0) = 0.$ Then $f$ is differentiable everywhere, $f'(0) \neq 0$ and $f$ is not injective on any neighbourhood of the origin.

IFT for Strongly Differentiable functions at a point. Suppose $f$ is strongly differentiable at $v$. If $f'(v)$ is invertible then $f$ is locally a homeomorphism around $v$ and $f^{-1}$ is strongly differentiable at $v.$ (Differentiability of $f$ is not needed anywhere else on $\mathrm{A}.$)

IFT for $p$-times differentiable functions. Suppose $f$ is $p \geq 2$ differentiable on $\mathrm{A}$. If $f'(v)$ invertible then $f$ is locally a homeomorphism and $f^{-1}$ is $p$ times differentiable as well (on the neighbourhood of $f(v)$ where it is defined).

IFT for $\mathscr{C}^p$ functions. Suppose $f$ is of class $\mathscr{C}^p.$ If $f'(v)$ is invertible then $f$ is locally a $\mathscr{C}^p$-diffeomorphism.

IFT for analytic functions. Suppose $\mathrm{A}$ and $\mathrm{B}$ are subsets of $\mathbf{R}^d$ and $f$ is analytic (at every point in $\mathrm{A}$ it has a power series representation on a neighbourhood of said point). If $f'(x)$ is invertible then $f$ is locally invertible around $x$ and the inverse function is also analytic.

NOTES:

  1. "Banach's theorem": if $L$ is a continuous invertible linear function between two Banach spaces then $L$ is a homeomorphism.(This allows us to assume $f'(v)$ to be invertible as opposed to a linear homeomorphism as is more commonly.)
  2. Suppose $f$ is differentiable everywhere on $\mathrm{A}.$ For $f$ to be strongly differentiable at $v$ it is necessary and sufficient for $f$ to be continuously differentiable at $v$ (i.e., the derived function $f'$ defined on $\mathrm{A}$ is continuous on $v$).
William M.
  • 7,936