4

Problem 7-4(a) of John M. Lee's Introduction to Smooth Manifolds asks us to show that for any matrix $A\in M(n,\mathbb R)$, we have $$\left.\frac{d}{dt} \det(I_n+tA)\right|_{t=0}=\operatorname{tr}(A)$$ where $\det$ is the usual determinant map. The book's hint is to use the expansion $$\det(B)=\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n b_{i,\sigma(i)},\text{ }B=[b_{ij}]_{i,j=1}^n$$ but when doing calculations with even the simple case $n=3$, I'm getting values that are not equal to $\operatorname{tr}(A)$. Could someone tell me what I'm doing wrong?

My work:

Going through the full determinant expansion for $n=3$, I computed \begin{align*} \det(I_n+tA) &= (1+t a_{11})(1+ta_{22})(1+ta_{33})\\ &- {\color{red}{a_{12}a_{21}(1+ta_{33})}}\\ &- {\color{red}{(1+ta_{11})a_{23}a_{32}}}\\ &- {\color{red}{a_{13}(1+ta_{22})a_{31}}}\\ &+ a_{12}a_{23}a_{31}\\ &+ a_{13}a_{32}a_{21}\\ \end{align*}

Because they're constants, the last two terms $a_{12}a_{23}a_{31},a_{13}a_{32}a_{21}$ vanish when differentiating. The first term $(1+t a_{11})(1+ta_{22})(1+ta_{33})$ expands as $$(1+t a_{11})(1+ta_{22})(1+ta_{33})=1+\operatorname{tr}(A)t+(\text{higher powers of $t$})$$ so, because $\left.\frac{d}{dt}t^n\right|_{t=0}=0$ for $n\geq 2$, we get $$\left.\frac{d}{dt}(1+t a_{11})(1+ta_{22})(1+ta_{33})\right|_{t=0} =\operatorname{tr}(A)$$ What worries me are the terms in red. Their derivatives are \begin{align*} \left.\frac{d}{dt} a_{12}a_{21}(1+ta_{33})\right|_{t=0} &= a_{12}a_{21}a_{33}\\ \left.\frac{d}{dt} (1+ta_{11})a_{23}a_{32}\right|_{t=0} &= a_{11}a_{23}a_{32}\\ \left.\frac{d}{dt} a_{13}(1+ta_{22})a_{31}\right|_{t=0} &= a_{13}a_{22}a_{31}\\ \end{align*} so I've (seemingly) calculated $$\left.\frac{d}{dt} \det(I_n+tA)\right|_{t=0}=\operatorname{tr}(A){\color{red}{-a_{12}a_{21}a_{33}-a_{11}a_{23}a_{32}-a_{13} a_{22}a_{31}}} $$ Does the red part evaluate to $0$? If not, where have I gone wrong? Any help is appreciated.

Mittens
  • 46,352
Alann Rosas
  • 6,553

3 Answers3

5

If you look at the terms of the determinant of $B = I+tA$ on the RHS, below

$$\det(B)=\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n b_{i,\sigma(i)}$$

ask yourself, which products $\prod_{i=1}^n b_{i,\sigma(i)}$ contain an $t^2$ order or higher. If you consider the identity, i.e, $\sigma(i)=i$ (just the product of the diagonal), you would get something like

\begin{align*} (1+ta_{11})\cdots (1+ta_{nn}) = 1+\text{tr}(A)t+O(t^2) \end{align*}

This is where you'd get the trace from, because the $O(t^2)$ terms vanish after taking the derivative and setting $t=0$. Now, what is the contribution of the other permuations $\sigma$ (You hope to show that it is $O(t^2)$ so you know they contribute nothing to the derivative).

Suppose that $\sigma$ is not the identity and, then for some pair $(i,j)$, $\sigma(i) =j$. But now $\sigma(j) \not=j$ as well and so \begin{align*} b_{i\sigma(i)}&=ta_{i\sigma(i)}\\ b_{j\sigma(j)} &= ta_{j\sigma(j)} \end{align*}

so the product

\begin{align*} \prod_{i=1}^n b_{i,\sigma(i)} = O(t^2) \end{align*}

so contributes nothing to the derivative.

3

Another approach - for every $t\neq 0$ $$\det(I_n+tA)=(-1)^nt^n\det\left(\frac{-1}{t}I_n-A\right)=(-1)^nt^nP_A(-\frac{1}{t})$$ where $P_A(t)=t^n-\operatorname{tr}(A)t^{n-1}+\ldots+a_0$ is the characteristic polynomial of $A$. Hence: $$\det(I_n+tA)=(-1)^nt^nP_A(-\frac{1}{t})=(-1)^n\cdot(-1)^n-(-1)^{n}\cdot(-1 )^{n-1}\operatorname{tr}(A)t+o(t^2)=1+\operatorname{tr}(A)t+o(t^2)$$ Taking the derivative w.r.t to $t$ at $0$ concludes the proof.

GBA
  • 5,337
  • 1
  • 8
  • 18
2

If one identifies the space real $n\times n$ matrices with $\mathbb{R}^{n^2}$ by vertically concatenating the column vectors of such matrices, we have that

Lemma: Let $\Delta:\mathbb{R}^{n^2}\longrightarrow\mathbb{R}$ be the determinant function, i.e. $$\Delta(\alpha_{11},\ldots,\alpha_{n1},\ldots,\alpha_{1n}, \ldots,\alpha_{nn})^{\top} = \det[(\alpha_{ij})]$$ where $(\alpha_{ij})$ is the $n\times n$--matrix whose $ij$--th component is $\alpha_{ij}$. Then, $$\Delta_\alpha= \frac{\partial \Delta}{\partial\alpha}= (W_{11}\ldots,W_{n1},\ldots,W_{1n},\ldots,W_{nn})$$ where $W_{ij}$ is the $ij$--th cofactor of the matrix $(\alpha_{ij})$.

The proof of this Lemma an simple exercise about computing determinants using the cofactor formula.

Consider the function $\phi(t)=\operatorname{det}(I+tA)=\Delta\circ g(t)$ where $g:\mathbb{R}\mapsto \mathbb{R}^{n^2}$ is given by $$t\mapsto[\mathbf{e}_1,\ldots,\mathbf{e}_n]+t(a_{11},\ldots,a_{n1},\ldots,a_{1n},\ldots,a_{nn})^\intercal$$ Notice that $g$ corresponds to the matrix $I+tA$, where for each $1\leq j\leq n$, $\mathbf{e}_j$ is the $j$-th (column) vector that has $j$-th component $1$ and zeroes everywhere else. An application of the cain rule yields $$\phi'(t)=\Delta'(g(t))g'(t)$$

In particular, at $t=0$, $$\Delta'(g(0))g'(0)=[\mathbf{e}_1,\ldots,\mathbf{e}_n]^\intercal\begin{pmatrix} a_{11} \\ \vdots\\ a_{n1}\\ \vdots\\ a_{1n}\\ \vdots\\ a_{nn} \end{pmatrix}=\operatorname{Trace}(A) $$

Mittens
  • 46,352
  • Thank you for your answer. A question: how do we get $\Delta'(g(0))=[\textbf{e}_1,\dots,\textbf{e}_n]$? It seems like this was used in the calculations. – Alann Rosas Mar 06 '25 at 06:59
  • @AlannRosas: $g(0)=I_n$. The $ij$-th cofactor of $I_n$ is $\delta_{ij}$, that is, $1$ when $i=j$ and $0$ otherwise. – Mittens Mar 06 '25 at 15:08
  • I see, thank you. Also, I noticed you edited your post to take the transpose of $[\textbf{e}_1, \dots, \textbf{e}_n]$ at the end — maybe you meant to do this in the definition of $g$ too? – Alann Rosas Mar 06 '25 at 17:23
  • @AlannRosas: I agree the notation may be cumbersome; I used $[x_1,\ldots, x_n]$ to denote a column vector and $(x_1,\ldots, x_n)$ a row vector. Of course $[x_1,\ldots, x_n]^\intercal=(x_1,\ldots,x_n)$. Also, $\mathbf{e}_j$, the $j$-th vector in the standard ordered basis of $\mathbb{R}^n$ is consider as a column vector. $g'(0)$ is a column vector formed by "concatenating" into one column all the columns of matrix $A$. – Mittens Mar 06 '25 at 18:19