$\newcommand{\d}{\mathrm{d}}\newcommand{\id}{\mathrm{Id}}\newcommand{\L}{\mathscr{L}}\newcommand{\M}{\mathcal{M}}\newcommand{\con}{\operatorname{Con}}\newcommand{\I}{\mathcal{I}}$For context, I had written this whole thing up when I thought the OP was asking about a theorem involving Frechet derivatives, not this different "strong" derivative concept. I then deleted it since it answered the wrong question but I've now realised that the proof of one is readily adapted to a proof of the other. So that it wasn't a complete waste of time, I'm going to remark about the adaptations and un-delete this. I realise that this is now a very verbose version of krm2233's answer.
I write $\L$ for the space of continuous linear maps (given its "strong" operator-norm topology) and $\L_\cong\subset\L$ for the subset of linear isomorphisms. I write $\d_\bullet$ for the derivative rather than $D$.
Theorem $1$:
Say $X$ and $Y$ are Banach spaces, $\Omega\subseteq X$ is a nonempty open set and $f:\Omega\to Y$ is a function whose derivative $\d_f:\Omega\to\L(X;Y)$ exists and is continuous at $x_0\in\Omega$. Suppose further that $\d_f(x_0)$ is invertible.
Then there is an open neighbourhood $U\subseteq\Omega$ of $x_0$ and an open neighbourhood $V$ of $f(x_0)$ such that $f:U\to V$ is a whose inverse $g:V\to U$ is differentiable with $\d_g(y)=\d_f(g(y))^{-1}$ for all $y\in V$ ($\d_g$ is necessarily continuous at $f(x_0)$ but not necessarily anywhere else).
Moreover, if $f$ is $C^k$ (on $\Omega$) for $k\in\Bbb N\cup\{\infty\}$ then so is $g$.
Theorem $2$:
Say $X$ and $Y$ are Banach spaces, $\Omega\subseteq X$ is a nonempty open set and $f:\Omega\to Y$ is a function which admits a strong derivative at $x_0\in\Omega$, $\d_f(x_0)$, which is invertible.
Then there is an open neighbourhood $U\subseteq\Omega$ of $x_0$ and an open neighbourhood $V$ of $f(x_0)$ such that $f:U\to V$ is a homeomorphism whose inverse $g:V\to U$ is strongly differentiable at $f(x_0)$ with $\d_g(f(x_0))=\d_f(x_0)^{-1}$.
Moreover, if $f$ is (strongly) differentiable everywhere on $\Omega$, then so is $g$. If $f$ is strongly $C^k$ (i.e. its $k$th derivative exists everywhere, is continuous and is strong) on $\Omega$ for $k\in\Bbb N\cup\{\infty\}$ then so is $g$ (on $V$).
It's worth noting that $\d_f$ does not need to be everywhere continuous even though this is how the theorem is usually stated. The theorem can be even further weakened for the finite dimensional case but according to Tao the proof crucially relies on local compactness, which fails in infinite dimensions, so I'm guessing this weakening is false for general Banach spaces.
My proof is inspired by ideas from this set of notes, which maybe will have a more comprehensible treatment of the finite dimensional case. I am told the proof idea is pretty standard but I wanted to work out the details for myself.
The idea is that if $y$ is close to $f(x_0)$ and $y=f(x')$ for some $x'$, then for $x$ close to $x_0$ we should have: $$y=f(x)+f(x')-f(x)\approx f(x)+\d_f(x)(x'-x)$$So we should be able to estimate: $$x'\approx\d_f(x)^{-1}(y-f(x))+x$$But the continuity of $\d_f$ at $x_0$ suggest we can approximate $\d_f(x)$ with $\d_f(x_0)$: $$x'\approx x+\d_f(x_0)^{-1}(y-f(x))$$Which lends itself to the study of the fixed point iteration: $$x\mapsto x+\d_f(x_0)^{-1}(y-f(x))$$
To keep the notation clean it's nice to make some simplifications first, though. Since $\d_f(x_0)$ is a diffeomorphism $X\cong Y$, the theorem's conclusion holds for $f$ iff. it holds for $\d_f(x_0)^{-1}\circ f:X\to X$, whose derivative at $x_0$ is the identity. Since translations are also diffeomorphisms, we may adjust things so that $x_0=0$ and $f(x_0)=0$.
So - without loss of generality, assume $Y=X$, $0\in\Omega$, $f(0)=0$ and that $\d_f(x_0)=\id$.
This is a proof of theorem $1$. Remarks to switch to a proof of theorem $2$ will appear.
The linked notes use a bound obtained via integration which is fair enough in the finite dimensional case but we can obtain similar bounds without needing this tool (and on weaker hypotheses, since the derivative of our $f$ may fail to be integrable). We can obtain the same bound without integration. The full power of this lemma is not required either but I'd never seen the general version before so I'll fill in the details here.
Theorem (general mean value inequality):
If $X$ is a normed linear space and $F:[0,1]\to X$ is everywhere continuous and is also differentiable on $(0,1)$ then: $$\|F(1)-F(0)\|\le\sup_{0<t<1}\|F'(t)\|$$Where $F'(\bullet):=\d_F(\bullet)(1)$ under the identification $X\cong\L(\Bbb R;X)$.
Proof $1$ (real induction):
Note that there is nothing to prove if $C:=\sup_{0<t<1}\|F'(t)\|=+\infty$, so assume $C$ is finite. Fix $\epsilon>0$.
Put $S:=\{t\in[0,1]:\|F(t)-F(0)\|\le(\epsilon+C)\cdot t\}$. The result follows if $S=[0,1]$ is shown. Note that $0\in S$ is trivially true. Since $[0,1]\ni t\mapsto\|F(t)-F(0)\|-(\epsilon+C)\cdot t\in\Bbb R$ is continuous we conclude $S$ is closed. In particular, if $t_0\in[0,1]$ has that $t\in S$ for all $0\le t<t_0$ then $t_0\in S$ follows. It remains to show that $t\in S$ implies the existence of $\delta>0$ such that $[t,t+\delta)\cap[0,1]\subseteq S$. Then, if $S\neq[0,1]$, we could deduce $\inf\,\,[0,1]\setminus S\in S$ and obtain a contradiction, so it would follow $S=[0,1]$.
So, say $t\in S$. If $t=0$, then note that continuity of $F$ at zero implies the existence of $\delta>0$ such that $0<t'<\delta$ implies $\|F(t')-F(0)\|<\epsilon$ from which is easily follows that for any such $\delta$ we have $[0,\delta)\cap[0,1]\subseteq S$. Suppose $0<t<1$ (if $t=1$ the claim is also trivial). Define $\epsilon'=\epsilon+C-\|F'(t)\|$. By definition of derivative, there is a $\delta>0$ such that if $t'\in(t-\delta,t+\delta)\cap S$ we have $\|F(t')-F(t)-F'(t)\cdot(t'-t)\|\le\epsilon'\cdot|t'-t|$, from which it follows $\|F(t')-F(t)\|\le(\|F'(t)\|+\epsilon')\cdot|t'-t|=(\epsilon+C)\cdot|t'-t|$. Thus if $t'\in[t,t+\delta)\cap[0,1]$ (so that $|t'-t|=t'-t$) we get: $$\begin{align}\|F(t')-F(0)\|&\le\|F(t')-F(t)\|+\|F(t)-F(0)\|\\&\le(\epsilon+C)\cdot(t'-t+t)\\&=(\epsilon+C)\cdot t'\end{align}$$Therefore $[t,t+\delta)\cap[0,1]\subseteq S$ as desired. $\blacksquare$
Proof $2$: fix any $\psi\in X^\ast$. By the general chain rule, if $G:[0,1]\to\Bbb R$ is given by $t\mapsto\psi(F(t))$ then $G$ is differentiable on $(0,1)$ with $G'\equiv\psi(F')$. The usual mean value inequality finds that $|G(1)-G(0)|\le\sup_{0<t<1}|\psi(F'(t))|\le\|\psi\|\cdot\sup_{0<t<1}\|F'(t)\|$. That is, $|\psi(F(1)-F(0))|\le\|\psi\|\cdot C$. Since the canonical map $X\hookrightarrow X^{\ast\ast}$ is an isometry and we have just shown that the image of $F(1)-F(0)$ has $X^{\ast\ast}$-norm no greater than $C$, it follows $\|F(1)-F(0)\|\le C$ as desired.
Back to the situation at hand:
By the continuity of $\d_f$ at zero (and openness of $\Omega$ at zero) I know there is some $\delta>0$ such that $x\in\Omega$ and $\|\id-\d_f(x)\|\le\frac{1}{2}$ for all $x\in X$ with $\|x\|\le\delta$. Put $K$ to be the closed ball of radius $\delta$ about the origin. I claim that: $$T:K\to K,\,x\mapsto x-f(x)$$Is (a) well-defined and (b) a contraction with $\|T(x')-T(x)\|\le\frac{1}{2}\|x'-x\|$ for all $x,x'\in K$. Since $T(0)=0$ it follows from point (b) that $T(K)$ is contained in the closed ball of radius $\delta/2$; combining this with the fact $K\subseteq\Omega$ makes (a) follow automatically.
Fix $x,x'\in K$. Put $F:[0,1]\to X$ via $t\mapsto T(x+t(x'-x))$. This is well defined since $K$ is convex. Moreover it is evidently continuous and it is differentiable on $(0,1)$ with "derivative" $F'(t)=(\id-\d_f(x+t(x'-x)))(x'-x)$ for all $0<t<1$; hence $\sup_{0<t<1}\|F'(t)\|\le\frac{1}{2}\|x'-x\|$ follows. Oh, but by our mean value inequality we know that means: $$\|T(x')-T(x)\|=\|F(1)-F(0)\|\le\frac{1}{2}\|x'-x\|$$As desired.
Define $V\subseteq\Omega$ as the open ball of radius $\delta/2$. This is an open neighbourhood of $0=f(0)$. Now for another theorem:
Theorem (parametric contraction mapping principle):
If $(\M;\rho)$ is a nonempty metric space, there is a topological space $\con(\M)$ (sorry, set theorists) consisting of all strong contractions on $T$, topologised by the base of sets of the form $\{T'\in\con(\M):\sup_{x\in\M}\rho(T'(x),T(x))<\epsilon\}$ for $\epsilon>0$ and $T\in\con(\M)$.
If $\Lambda$ is any topological space, $\Gamma:\Lambda\to\con(\M)$ any continuous function and if $\M$ is complete then the induced map $\Gamma^\ast:\Lambda\to\M$ which takes $\lambda$ to the unique fixed point of $\Gamma(\lambda)$ is (a) well-defined and (b) continuous. Moreover, if $\Lambda$ is a metric space and $\Gamma$ is "Lipschitz" and there is $c<1$ such that $c$ serves as a Lipschitz/contraction constant for every $\Gamma(\lambda),\,\lambda\in\Lambda$ then $\Gamma^\ast$ is also Lipschitz.
Proof:
The usual contraction mapping principle proves claim (a) and further proves that $\Gamma^\ast(\lambda)=\lim_{n\to\infty}\Gamma(\lambda)^n(x)$ for any $x\in\M,\,\lambda\in\Lambda$. Fix $\lambda,\lambda'\in\Lambda$ with the property that $\epsilon:=\sup_{x\in\M}\rho(\Gamma(\lambda)(x),\Gamma(\lambda')(x))$ is finite.
There is $x_0\in\M$. Let $c$ be a contraction constant for $\Gamma(\lambda)$. Define for $n\in\Bbb N_0$ the quantity $\alpha_n:=\rho(\Gamma(\lambda)^n(x_0),\Gamma(\lambda')^n(x_0)$. We know $\lim_{n\to\infty}\alpha_n=\rho(\Gamma^\ast(\lambda),\Gamma^\ast(\lambda'))$.
I claim $\alpha_n\le\epsilon\cdot\sum_{j=0}^{n-1}c^j$ for all $n$. This is trivially true for $n=0$. If it is true for some $n\in\Bbb N_0$ then consider: $$\begin{align}\alpha_{n+1}&=\rho(\Gamma(\lambda)(\Gamma(\lambda)^n(x_0)),\Gamma(\lambda')(\Gamma(\lambda')^n(x_0))\\&\le\rho(\Gamma(\lambda)(\Gamma(\lambda')^n(x_0)),\Gamma(\lambda')(\Gamma(\lambda')^n(x_0)))\\&\quad\quad+\rho(\Gamma(\lambda)(\Gamma(\lambda)^n(x_0)),\Gamma(\lambda)(\Gamma(\lambda')^n(x_0)))\\&\le\epsilon+c\cdot\rho(\Gamma(\lambda)^n(x_0),\Gamma(\lambda')^n(x_0))\\&=\epsilon+\alpha_n\\&\le\epsilon\cdot\sum_{j=0}^nc^j\end{align}$$Thus by induction the claim is always true. Since $c<1$ we infer: $$\rho(\Gamma^{\ast}(\lambda),\Gamma^{\ast}(\lambda'))\le\frac{\epsilon}{1-c}$$
This bound automatically implies continuity of $\Gamma^{\ast}$ given continuity of $\Gamma$. Moreover if $\Lambda$ is a metric space with metric $\rho^\ast$ and there is a global constant $K>0$ with $\sup_{x\in\M}\rho(\Gamma(\lambda)(x),\Gamma(\lambda')(x))\le K\rho^\ast(\lambda,\lambda')$ for all $\lambda,\lambda'\in\Lambda$ and if there is a global $c<1$ which is a contraction constant for each $\Gamma(\lambda),\,\lambda\in\Lambda$ then we infer that for all $\lambda,\lambda'\in\Lambda$: $$\rho(\Gamma^\ast(\lambda),\Gamma^\ast(\lambda'))\le\frac{K}{1-c}\cdot\rho^\ast(\lambda,\lambda')$$So $\Gamma^\ast$ is Lipschitz, as claimed.
With that out of the way, define $\Gamma:V\to\con(K)$ via $y\mapsto(x\mapsto y+T(x))$. This is well defined because $T(K)$ is contained in the closed ball of radius $\delta/2$ so $y+T(x)$ is always in $V+\frac{1}{2}K\subseteq K$ (and because each $\Gamma(y)$ is a contraction, see below). Observe that $\Gamma(y)(x)\equiv x+y-f(x)$ which is the (analogue to the) mapping mentioned in the motivation section. Since $\|\Gamma(y)(x)-\Gamma(y')(x)\|\equiv\|y-y'\|$, $\Gamma$ is obviously "Lipschitz" (with constant $1$) and since $\|\Gamma(y)(x)-\Gamma(y)(x')\|\equiv\|T(x)-T(x')\|\le\frac{1}{2}\|x-x'\|$ we can choose $c=1/2$ as our global contraction constant. It follows from the above theorem that $\Gamma^{\ast}:V\to K$ is Lipschitz (with constant $2$).
Write $g:=\Gamma^\ast$ for convenience and $K'$ for the open ball of radius $\delta$. Since $g(0)=0$, the Lipschitz-constant-$2$ observation means $g(V)\subseteq K'$. Note that $f\circ g=\id_V$, very importantly, just by checking the fixed point condition. $U:=f^{-1}(V)\cap K'$ is an open subset of $X$, containing $0$ and it follows that $U=g(V)$ and that $f(V)=U$. Therefore we have established that $f$ restricts to a homeomorphism, with Lipschitz inverse, between the open sets $U$ and $V$.
To handle theorem $2$, we do not have a continuous derivative or even a derivative at all (not necessarily, anyway) on $\Omega$. However, we can choose $\delta>0$ be so that $K\subseteq\Omega$ holds again and if $x,x'\in K$ then: $$\frac{\|f(x')-f(x)-(x'-x)\|}{\|x'-x\|}\le\frac{1}{2}$$From which it is very obvious that $T$ is a contraction with contraction constant $1/2$, and the exact same definitions of $U$, of $V$, of $\Gamma$ all work. Observe that while continuity of $f$ was not a hypothesis in theorem $2$, $f$ must be continuous on $U$ since $f=\id-T$ is the difference of two continuous maps ($T$, as a contraction, is forced to be continuous). Then we have established $f$ to restrict to a homeomorphism of open neighbourhoods of zero, $U\cong V$. It remains to show that $\d_f(0)^{-1}$ serves as a strong derivative for $g$ at $0$ and it remains to show the stuff about $C^k$. Note however that if $f$ is $C^k$ on $\Omega$ then the derivatives must be strong (at all points of $\Omega$) since the mean value inequality would be available and the same technique used to show $T$ is a contraction (in the proof of theorem $1$) would show us what we want. Then $g$ will be $C^k$ by theorem $1$ and $g$'s derivatives will all be strong, so, those last conclusions will actually follow without further effort.
Resume the proof of theorem $1$:
We still need to check that $g$ is differentiable. From the bound $\|\id-\d_f(x)\|\le\frac{1}{2}$ for $x\in K$ we deduce $\d_f(x)$ is invertible, thus a homeomorphism by the open mapping theorem, for all $x\in K$, in particular for all $x\in U$. Then $\d_g:V\to\L(X;X)$, $y\mapsto\d_f(g(y))^{-1}$ is well-defined. We just need to check it serves as a genuine derivative for $g$.
Fix $y\in V$ and write $x=g(y)$. In what is about to follow, we make a change of variables $x'\sim y'$ where $x'\in U$ and $f(x')=y'$, $g(y')=x'$ and this will be valid since $g,f$ are homeomorphisms. It's also going to be important that $\d_f(x)$ is a homeomorphism. Understand all limit equations as: the left hand side exists and equals [-] therefore the right hand side exists and equals [-].
$$\begin{align}0&=\d_f(x)^{-1}(0)\\&=\d_f(x)^{-1}\left(\lim_{x'\to x}\frac{\d_f(x)(x'-x)-(f(x')-f(x))}{\|x'-x\|}\right)\\&=\d_f(x)^{-1}\left(\lim_{y'\to y}\frac{\d_f(x)(g(y')-g(y))-(y'-y)}{\|g(y')-g(y)\|}\right)\\&=\lim_{y'\to y}\frac{g(y')-g(y)-\d_f(g(y))^{-1}(y'-y)}{\|y'-y\|}\cdot\underset{\ge1/2}{\underbrace{\frac{\|y'-y\|}{\|g(y')-g(y)\|}}}\end{align}$$Implying that: $$\lim_{y'\to y}\frac{g(y')-g(y)-\d_f(g(y))^{-1}(y'-y)}{\|y'-y\|}=0$$As desired. Therefore $g:V\to U$ is differentiable in the expected way. $\blacksquare$
Back to theorem $2$: observe that the same limit manipulations will go through (taking $x=x_0=0$) since $g\times g$ and $f\times f$ shall be homeomorphisms $U\times U\cong V\times V$. $g$ will still be shown to be Lipschitz, with constant $2$, so the last part about bounding the quotient $\|y'-y\|/\|g(y')-g(y)\|$ carries over. Then the proof is more or less identical so long as we switch to double limit notation: $$\begin{align}0&=\d_f(0)^{-1}(0)\\&=\d_f(0)^{-1}\left(\lim_{(x',x)\to(0,0)}\frac{\d_f(0)(x'-x)-(f(x')-f(x))}{\|x'-x\|}\right)\\&=\d_f(0)^{-1}\left(\lim_{(y',y)\to(0,0)}\frac{\d_f(0)(g(y')-g(y))-(y'-y)}{\|g(y')-g(y)\|}\right)\\&=\lim_{(y',y)\to(0,0)}\frac{g(y')-g(y)-\d_f(0)^{-1}(y'-y)}{\|y'-y\|}\cdot\underset{\ge1/2}{\underbrace{\frac{\|y'-y\|}{\|g(y')-g(y)\|}}}\end{align}$$Proving $\d_f(0)^{-1}$ to be a strong derivative for $g$ at zero.
The last thing to do is check the statement about $C^k$. Denote by $\I$ the inversion: $\I:\L_\cong(X;Y)\to\L_\cong(Y;X)$. This is always a smooth map ($\L_\cong(X;Y)$ being an open subset of $\L(X;Y))$. We may calculate: $$\forall F\in\L_\cong(X;Y),\,\d_\I(F)=(\L(X;Y)\ni H\mapsto-F^{-1}\circ H\circ F^{-1}\in\L(Y;X))$$Proving $\I$ is at least continuously differentiable.
We have: $$\d_g:V\overset{g}{\longrightarrow}U\overset{\d_f}{\longrightarrow}\L_\cong(X;Y)\overset{\I}{\longrightarrow}\L(Y;X)$$If $f$ is $C^1$ then every arrow in that diagram is continuous, hence $\d_g$ is continuous so $g$ is $C^1$. If $f$ is $C^2$ then $\d_f$ is $C^1$, $g$ is $C^1$ and $\I$ is $C^1$ so it would follow $\d_g$ is $C^1$, so that $g$ is $C^2$ as well, and the claim $g\in C^k$ if $f\in C^k$ for $k=1,2,\cdots,\infty$ follows by this induction (given that $\I$ is smooth and that the property of being of class $C^k$ is stable under composition).
For Banach spaces $U,V,W$, write $c_\ast:\L(V;W)\to\L(\L(U;V);\L(U;W))$ for the forward "currying" morphism that takes $T:V\to W$ to the map $(H:U\to V)\mapsto(TH:U\to W)$ and similarly write $c^\ast:\L(U;V)\to\L(\L(V;W);\L(U;W))$ for the precomposition "currying" map. Notice that $c^\ast$ and $c_\ast$ are always continuous linear transformations and are equal to their own derivative.
We can then express: $$\d_\I:\L_\cong(X;Y)\overset{\I}{\longrightarrow}\L(Y;X)\overset{(-1)\oplus1}{\longrightarrow}\L(Y;X)\oplus\L(Y;X)\\\overset{c_\ast\oplus c^\ast}{\longrightarrow}\L(\L(Y;Y);\L(Y;X))\oplus\L(\L(X;Y);\L(Y;Y))\overset{\circ}{\longrightarrow}\L(\L(X;Y);\L(Y;X))$$
If we also write $\circ:\L(V;W)\oplus\L(U;V)\to\L(U;W)$ for the map that takes $(F,G)\mapsto FG$. It would then follow by the same induction argument that if $\circ$ is smooth, so is $\I$ (notice that the middle arrows are linear maps hence smooth).
It's easy to prove that $\circ$ is everywhere continuously differentiable with derivative $\d_\circ(F,G)=c^\ast(G)\oplus c_\ast(F):\L(V;W)\oplus\L(U;V)\to\L(U;W)$ for all $F,G$.
Exercise: use the formal chain rule and this observation to show that $\I$ is $C^2$ with: $$\d^2_\I(F)(G)\equiv(\L(X;Y)\ni T\mapsto F^{-1}GF^{-1}TF^{-1}+F^{-1}TF^{-1}GF^{-1}\in\L(Y;X))$$
Now observe that $\d_\circ$ is itself a continuous linear transformation. Thus $\d^2_\circ$ exists and is constant, and $\d^n_\circ$ is the zero map for all $n\ge 3$. $\circ$ is thus smooth, completing the proof (so long as we expect being of class $C^k$ to be stable under composition).
To see being of class $C^k$ is stable under composition, we induct on $k$. With $k=1$ this is fairly obvious from the chain rule. Let's fix $U\overset{f}{\longrightarrow}V\overset{g}{\longrightarrow}W$ which are both of class $C^k$, $k\ge1$. We can express: $$\d_{gf}:U\overset{\Delta}{\longrightarrow}U\oplus U\overset{f\oplus\d_f}{\longrightarrow}V\oplus\L(U;V)\overset{\d_g\oplus1}{\longrightarrow}\L(V;W)\oplus\L(U;V)\overset{\circ}{\longrightarrow}\L(U;W)$$The first and last arrows in the diagram are smooth. Since it's easy to see that, if $h$ is of class $C^m$, so is $1\oplus h$, and because $f,\d_f,\d_g$ are by hypothesis all of class (at least) $C^{k-1}$, we find $\d_{gf}$ a composite of class $C^{k-1}$ maps hence, by induction, itself of class $C^{k-1}$. Therefore $gf$ is of class $C^k$. Done!
If $h:U\to V$ is $C^k$ then $1\oplus h:X\oplus U\to X\oplus V$ is $C^k$ with every $n$th derivative, $1\le n\le k$, equal to the map: $$\d_{1\oplus h}^n:X\oplus U\to\L(X\oplus U;-)^n(X\oplus V)$$Defined by: $$\begin{align}\d_{1\oplus h}(x,u)(x_1,u_1)&=(x_1,\d_h(u)(u_1))\\\d_{1\oplus h}^n(x,u)(x_n,u_n)\cdots(x_1,u_1)&=(0,\d_h^n(u)(u_n)\cdots(u_1)),\quad\quad n>1\end{align}$$