3

Consider the inversion function $f:GL_n( \mathbb{R}) \rightarrow GL_n (\mathbb{R})$ , $f(X)=X^{-1}.$ Where $GL_n( \mathbb{R})$ denotes the set of invertible $ n \times n$ matrices over the reals.

The question wants me to show that it is a differentiable function and then to calculate its derivative. It says to think of the set as a subset of $\mathbb{R} ^{n^{2}}$.

I know that if the partials exist and are continuous then it is differentiable, I can't calculate the partials explicitly though since it seems too difficult, just thinking about it I know if I were to change 1 entry in the matrix keeping all others constant (this is how I interpret partial derivative of this function, is this correct?), I could find a neighbourhood around that entry such that the matrix is still invertible (since $det:\mathbb{R}^{n \times n} \rightarrow \mathbb{R}$ is continuous? - this has been shown in my lecture notes) Is this the correct way to go about it? I have no solutions available to me so just seeking some clarification on here to make sure my understanding isn't completely all wrong, thanks!

2 Answers2

5

I'd like to add a few remarks addressing the questions of OP as comments to Tsemo's answer. It is likely that said questions are resolved by now for the OP it might be useful to someone else.

Here are some further discussions on the same topic (some quite old): [1], [2], [3].


To address the first question-as-comment, inversion, in the case of finite-dimensional underlying space, ought to be rational (which means it is the ratio of two polynomials), as each entry of the inverse matrix will be a polynomial up to scaling by the determinant. (In this regard it is trivial to compare the $n=1$ case and meditative to compute the derivative of inversion of $2\times 2$ matrices.)


The second question-as-comment I think is more substantial. In that, indeed, the two paragraphs of Tsemo's answer can be distinguished by realizing that the first paragraph works in the case of finite dimensions, while the second paragraph works even when $\mathbb{R}^n$ is replaced by an arbitrary Banach space (which still is not the most general case of this, see e.g. this discussion). Of course this is not a problem as the question was stated for finite dimensions. Still, I think the algebraic shortcut that is guaranteed by finite dimensionality hides something fundamental of the object we are dealing with.

Indeed, from a categorical/Lie theoretic point of view inversion, and inversion being $C^\infty$, are definitional for $GL_n(\mathbb{R})$. Out of $M_n(\mathbb{R})$ we can cut out $GL_n(\mathbb{R})$ by considering it to be the domain of inversion (which makes sense without restrictions in arbitrary dimensions, as opposed to taking determinant to be nonzero).

From this point of view, $GL_n(\mathbb{R})$ being open, inversion being continuous and inversion being $C^\infty$ can all be easily derived from the well-known power series

$$(I+H)^{-1}=\sum_{n\geq0}(-H)^n\mbox{ for }\Vert H\Vert <1, $$

$H$ being a linear operator on $\mathbb{R}^n$ and $I=\operatorname{id}_{\mathbb{R}^n}$. Given the power series, as in Tsemo's answer we can say that for a fixed $A\in GL_n(\mathbb{R})$ and for $H$ with $\Vert HA^{-1}\Vert <1$ (e.g. for $H$ with $\Vert H\vert <\dfrac{1}{\Vert A^{-1}\Vert}=\operatorname{conorm}(A)$, which one may interpret as a byproduct of translating the derivative at $A$ to a derivative at $I$), we have that

$$(A+H)^{-1} = (A+HA^{-1}A)^{-1} =A^{-1}(I+HA^{-1})^{-1} =A^{-1}\sum_{n\geq0}(-HA^{-1})^n,$$

thus

$$\dfrac{\Vert(A+H)^{-1}- [A^{-1}- A^{-1}HA^{-1}]\Vert}{\Vert H\Vert} \leq \dfrac{\Vert H\Vert \Vert A^{-1} \Vert^2}{1- \Vert H \Vert \Vert A^{-1}\Vert} \xrightarrow{H\to 0} 0. $$


In the previous paragraph I believe I made it clear how structural principles hint at the fact that a power series expansion ought to play a role in the question. Even without adhering to the structural point of view though one is lead to the power series. Indeed, trying to take the derivative of the (nonlinear) operator that is inversion means that we're trying to find a linear $\lambda:=\left((\cdot)^{-1}\right)'(A):M_n(\mathbb{R})\to M_n(\mathbb{R}) $ such that

$$\lim_{H\to 0}\dfrac{\Vert (A+H)^{-1}-[A^{-1}+\lambda H] \Vert}{\Vert H \Vert} = 0.$$

Somehow we need to be able to write down $(A+H)^{-1}$ more explicitly so that some terms cancel out. By definition $(A+H) (A+H)^{-1} = I$, so that $(A+H)^{-1} = A^{-1}(I-H(A+H)^{-1})$. Using this recursive formula twice in a row we have:

$$(A+H)^{-1} = A^{-1}-A^{-1}HA^{-1}+A^{-1}HA^{-1}H(A+H)^{-1}.$$

This gives us what we want, granted that we have already guaranteed that inversion is continuous (which would guarantee that the rightmost summand on RHS decays quadratically in $H$). So another way to differentiate inversion would go through a separate proof of inversion being continuous (for a proof of this that doesn't use power series explicitly, see Rudin's Principles of Mathematical Analysis (3e) Theorem 9.8. on p. 209). Still, I hope it's clear by now that even this argument uses the power series, albeit implicitly. (Observe that the same argument, with the recursive formula used arbitrarily many times, establishes the power series we used.)


As a final comment I'd like to add that the fact that inversion is $C^\infty$ is not irrelevant to general purpose mathematics. One straightforward application is a corollary to the Inverse Function Theorem: if $f:U\to \mathbb{R}^n$ is $C^1$ and $f'(x_0)$ is a linear isomorphism, then $f$ is a $C^1$ diffeomorphism near $x_0$, and the formula for the derivative of the inverse of $f$ is:

$$\left(f^{-1}\right)'=(\cdot)^{-1}\circ f'\circ f^{-1}. $$

The regularity of a composition is determined by the worst composant, so that if $f$ is $C^r$, $r\in \mathbb{Z}_{\geq1}$, to begin with, we would end up with a $C^r$ diffeomorphism near any point $x_0$ where $f'(x_0)$ is an linear isomorphism. (For reference purposes this is the content of the Inverse Function Theorem in Lang's Fundamentals of Differential Geometry, pp. 15-16.)

Alp Uzman
  • 12,209
2

$Gl(n,\mathbb{R})$ is an open subspace of the vector space $M(n,\mathbb{R})$, the inverse $X\rightarrow X^{-1}$ is a rational function of its coordinates (expressed with the cofactor matrices) so it is differentiable.

You have $(X+h)^{-1}=X^{-1}(I+hX^{-1})^{-1}$ write $hX^{-1}=u$ with $\|u\|<1$, you obtain that $(I+u)^{-1}=\sum(-1)^nu^n$, this implies that $(X+h)^{-1}=X^{-1}-X^{-1}hX^{-1}+O(h^2)$ and the differential is $h\rightarrow -X^{-1}hX^{-1}$.

Alp Uzman
  • 12,209
  • 1
    Thanks for your answer! So I'm not sure what a rational function is, don't think we've actually defined that in class. I see how you've manipulated the increments to get what you got, and surely from what you end up with, the derivative is $h \rightarrow -X^{-1}hX^{-1}$ ? (minus sign) – Displayname Mar 07 '19 at 10:11
  • Also, how did you know how to go about this question i.e use that power series representation of $(I+u)^{-1}$? Is it just experience in dealing with these types of problems? – Displayname Mar 07 '19 at 10:13