10

Cliff Taubes wrote in his differential geometry book that:

We now calculate the directional derivatives of the map $$M\rightarrow M^{-1}$$ Let $\alpha\in M(n,\mathbb{R})$ denote any given matrix. Then the directional derivatives of the coordinates of the map $M\rightarrow M^{-1}$ in the drection $\alpha$ are the entries of the matrix $$-M^{-1}\alpha M^{-1}$$ Consider, for example, the coordinate given by the $(i,j)$th entry, $(M^{-1})_{ij}$. The directional derivative in the drection $\alpha$ of this function on $GL(n,\mathbb{R})$ is $$-(M^{-1}\alpha M^{-1})_{ij}$$ In particular, the partial derivative of the function $M\rightarrow (M^{-1})_{ij}$ with respect to the coordinate $M_{rs}$ is $-(M^{-1})_{ir}(M^{-1})_{sj}$.

I am wondering why this is true. He did not give any deduction of this formula, and all the formulas I know for matrix inverse does not generate anything similar to his result. So I venture to ask.

Bombyx mori
  • 20,152
  • One must keep in mind the following: 0. You treat an $n \times n$ matrix as a vector in $\mathbb{R}^{n^2}$.
    1. How to write the inverse matrix of $M$ in terms of $\det(M)$ and the cofactors.
    2. The directional derivative is just the gradient (that is the derivative $\frac{\partial M^{-1}{ij}}{\partial M{ij}}$) dot the direction matrix.
    – Teddy Sep 03 '12 at 12:28
  • I would believe that, since $M^{-1}{ij}=\frac{1}{\det(M)}(-1)^{i+j}A{ij}$. But this would lead to something like $A*e_{ij}$, which is not what the formula is. – Bombyx mori Sep 03 '12 at 13:43

3 Answers3

17

Not sure if this is the type of answer you want, since I'm giving another argument rather than explain his argument. However, this is how I usually think of it.

Let $M$ be a matrix and $\delta M$ the infinitesimal perturbation (e.g. $\epsilon$ times the derivative). Now, let $N=M^{-1}$ and $\delta N$ the corresponding perturbation of the inverse so that $N+\delta N=(M+\delta M)^{-1}$. Including only first order perturbations (i.e. ignoring terms with two $\delta$s), this gives $$ \begin{split} I=&(M+\delta M)(N+\delta N)=MN+M\,\delta N+\delta M\,N\\ &\implies M\,\delta N=-\delta M\,N=-\delta M\,M^{-1}\\ &\implies \delta N=-M^{-1}\,\delta M\,M^{-1}.\\ \end{split} $$ Written in terms of derivatives, i.e. $M'=dM/ds$ and $N'=dN/ds$ where $M=M(s)$ and $N=N(s)$ and $M(s)N(s)=I$, the same would be written $$ 0=I'=(MN)'=M'N+MN'\implies N'=-M^{-1}\,M'\,M^{-1}. $$


To address some of the comments, although a bit belatedly:

For example, if you let $M(s)=M+s\Delta M$, this makes the derivative $M'(s)=\Delta M$ for all $s$. This makes $N(s)=M(s)^{-1}=(M+s\Delta M)^{-1}$, and you can use $M(s)\cdot N(s)=I$, and differentiate to get the above expressions.

For any partial derivative, e.g. with respect to $M_{rs}$, just set $\Delta M$ to be the matrix $E^{[rs]}$ with $1$ in cell $(r,s)$ and zero elsewhere, and you get $$ \frac{\partial}{M_{rs}} M^{-1} = -M^{-1}\frac{\partial M}{\partial M_{rs}} M^{-1} = -M^{-1} E^{[rs]} M^{-1} $$ which makes cell $(i,j)$ of the inverse $$ \frac{\partial (M^{-1})_{ij}}{\partial M_{rs}} = -(M^{-1})_{ir}(M^{-1})_{sj}. $$

Einar Rødland
  • 11,378
  • 23
  • 39
  • How did you get the last line? $M'$ seems to be vanished. – Bombyx mori Sep 03 '12 at 14:34
  • For this to hold we have $M'M^{-1}+MN"\rightarrow N'=-M^{-1}M'M^{-1}$ instead. Your original method would give $I\approx MN+M\delta N+N\delta M+\delta M\delta N$. This feels somewhat ad-hoc. – Bombyx mori Sep 03 '12 at 14:43
  • We thus have $M\delta N+\delta M N=0$, which gives $\delta N=-M^{-1}\delta M N=-M^{-1}aM^{-1}$. I see. – Bombyx mori Sep 03 '12 at 15:08
  • 1
    I'll expand the equations with an extra step to make it clearer. – Einar Rødland Sep 03 '12 at 15:22
  • 1
    What are $I'$, $M'$ and $N'$ supposed to mean ? – Georges Elencwajg Sep 03 '12 at 18:15
  • @GeorgesElencwajg: He probably treated this as a bilinear map from $R^{n^{2}}\times R^{n^{2}}\rightarrow R^{n^{2}}$, and used Leibniz's rule. $I$ would be the constant map. – Bombyx mori Sep 04 '12 at 00:15
  • @Georges The $I'$, $M'$ and $N'$ are the derivatives, e.g. if $M=M(s)$ then $M'=dM/ds$. Will make that more clear in the answer. – Einar Rødland Sep 04 '12 at 02:44
  • 1
    Dear @user32240, I know twhat he means, but he has to decide whether $M$ is a matrix, a variable or some function of a mysterious undeclared variable. The correct way is to give a name to the bilinear multiplication map : call it, say, $b$ and write $db_{(M,N)}(X,Y)=MY+XN$. – Georges Elencwajg Sep 04 '12 at 06:58
  • @GeorgesElencwajg: This is a very helpful remark, because the reason I was confused with the problem is basically how to write the derivative in coordinates. Your suggestion clarified everything. – Bombyx mori Sep 04 '12 at 07:01
9

I have the following result. I am assuming you already proved that the inversion map (I will call it $f$) is differentiable. We will look at the total derivative $Df(A)$ at $A\in GL(n,\mathbb{R})$.

Take the identity map $Id:GL(n,\mathbb{R})\to GL(n,\mathbb{R}):A\mapsto A$ and the map $g:GL(n,\mathbb{R})\to GL(n,\mathbb{R}):A\mapsto A\cdot A^{-1}=I_n$. Note that the derivative of $Id$ is $DId(A)(H)=Id(H)=H$ for $A,H\in GL(n,\mathbb{R})$ since $Id$ is a linear map. Furthermore, note that $g=Id\cdot f$ and that since $g$ is a constant map, it's derivative is the zero matrix. Here I use the following result that I will prove later on:

Let $h,k:GL(n,\mathbb{R})\to GL(n,\mathbb{R})$ be differentiable at $A\in GL(n,\mathbb{R})$. Then $$D(h\cdot k)(A)(H)=Dh(A)(H)k(A)+h(A)Dk(A)(H)\;\text{for}\; H\in GL(n,\mathbb{R})$$ From this follows: $$Dg(A)(H)=DId(A)(H)f(A)+Id(A)Df(A)(H)$$ $$0=H\cdot f(A)+A\cdot Df(A)(H)$$ $$-H\cdot A^{-1}=A\cdot Df(A)(H)$$ $$-A^{-1}HA^{-1}=Df(A)(H)$$ Which is the desired result. Now we have to show that the result I used is true. This is a bit iffy since I will prove it for functions on $\mathbb{R}^n$ and since there exists an isomorphism of vector spaces between $n\times m$-matrices and the metric space $\mathbb{R}^{nm}$ I think it also holds for matrices. Input is welcome but here it goes:

Suppose we have two functions $f:U\to\mathbb{R}^{n_1n_2}$ and $g:U\to\mathbb{R}^{n_2n_3}$ that are differentiable at $x_0$ with $U\subset\mathbb{R}^m$ an open subset. Define $\phi:\mathbb{R}^{n_1n_2}\times\mathbb{R}^{n_2n_3}\to\mathbb{R}^{n_1n_3}:(x,y)\mapsto xy$. Note that $h$ is bilinear and thus is differentiable with derivative: $Dh(x,y)(v,w)=h(v,y)+h(x,w)=vy+xw$ (nice exercise to prove this).

We define $k:U\to\mathbb{R}^{n_1n_2}\times\mathbb{R}^{n_2n_3}:x\mapsto (f(x),g(x))$. Note that $k$ is differentiable at $x_0$ if and only if it's components are. But it's components are $f$ and $g$ and so differentiable at $x_0$ by definition, thus $k$ is differentiable at $x_0$. Similarly the derivative of $k$ is the vector of derivatives of it's components.

By the Chain Rule $h\circ k$ is differentiable at $x_0$ with derivative: $$D(h\circ k)(x_0)=Dh(k(x_0))\circ Dk(x_0)$$ $$D(h\circ k)(x_0)=Dh((f(x_0),g(x_0))\circ (Df(x_0),Dg(x_0))$$ $$D(h\circ k)(x_0)=Df(x_0)g(x_0)+f(x_0)Dg(x_0)$$ The last part was obtained by using the identity for the derivative of bilinear maps I gave earlier.

Hope this is clear and any additions to the solution are welcome!

0

There is a different (not so useful) form with the same result, but I'm not sure why. You can write Cramer's rule in the following form: $$ A_{ij} \frac{\partial |A|}{A_{kj}} = |A| \delta_{ik} $$ where $\delta_{ik}$ are the entries of the identity, and the partial derivative is the $\pm$cofactor, so that $$ {A^{-1}}_{ml} = \frac{\partial\ln(|A|)}{\partial A_{lm}} = \frac{\partial |A|}{|A| \partial A_{lm}}.$$ Using the product rule, $$ \frac{\partial A^{-1}_{ml}}{\partial A_{ij}} = \frac{\partial^2|A|}{|A|\partial A_{ij}\partial A_{lm}} - {A^{-1}_{ji}}{A^{-1}_{ml}},$$ where the first term is a repeated cofactor, and the second the product of two inverse matrix elements, compared with the outer product of columns and rows of $A^{-1}$ as in the given answer.

Are they equal? Let's try the $2\times2$ matrix $A=\pmatrix{a & b \cr c & d}$ with inverse $A^{-1}=\frac{1}{ad-bc}\pmatrix{d & -b \cr -c & a}$: $$\frac{\partial A^{-1}}{\partial a} = \frac{ad-bc}{(ad-bc)^2}\pmatrix{0 & 0 \cr 0 & 1} - \frac{d}{(ad-bc)^2}\pmatrix{d & -b \cr -c & a} = \frac{-1}{(ad-bc)^2}\pmatrix{d \cr -c}\pmatrix{d & -b},$$ and so on, so yes, at least for this case. It has to be the same in general, but I don't see why.