Trouble proving that for any $A\in M (n,\mathbb R)$, we have $\left.\frac{d}{dt} \det (I_n+tA)\right|_{t=0}=\operatorname{tr}(A)$.

Question

Problem 7-4(a) of John M. Lee's Introduction to Smooth Manifolds asks us to show that for any matrix $A\in M(n,\mathbb R)$, we have $$\left.\frac{d}{dt} \det(I_n+tA)\right|_{t=0}=\operatorname{tr}(A)$$ where $\det$ is the usual determinant map. The book's hint is to use the expansion $$\det(B)=\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n b_{i,\sigma(i)},\text{ }B=[b_{ij}]_{i,j=1}^n$$ but when doing calculations with even the simple case $n=3$, I'm getting values that are not equal to $\operatorname{tr}(A)$. Could someone tell me what I'm doing wrong?

My work:

Going through the full determinant expansion for $n=3$, I computed \begin{align*} \det(I_n+tA) &= (1+t a_{11})(1+ta_{22})(1+ta_{33})\\ &- {\color{red}{a_{12}a_{21}(1+ta_{33})}}\\ &- {\color{red}{(1+ta_{11})a_{23}a_{32}}}\\ &- {\color{red}{a_{13}(1+ta_{22})a_{31}}}\\ &+ a_{12}a_{23}a_{31}\\ &+ a_{13}a_{32}a_{21}\\ \end{align*}

Because they're constants, the last two terms $a_{12}a_{23}a_{31},a_{13}a_{32}a_{21}$ vanish when differentiating. The first term $(1+t a_{11})(1+ta_{22})(1+ta_{33})$ expands as $$(1+t a_{11})(1+ta_{22})(1+ta_{33})=1+\operatorname{tr}(A)t+(\text{higher powers of $t$})$$ so, because $\left.\frac{d}{dt}t^n\right|_{t=0}=0$ for $n\geq 2$, we get $$\left.\frac{d}{dt}(1+t a_{11})(1+ta_{22})(1+ta_{33})\right|_{t=0} =\operatorname{tr}(A)$$ What worries me are the terms in red. Their derivatives are \begin{align*} \left.\frac{d}{dt} a_{12}a_{21}(1+ta_{33})\right|_{t=0} &= a_{12}a_{21}a_{33}\\ \left.\frac{d}{dt} (1+ta_{11})a_{23}a_{32}\right|_{t=0} &= a_{11}a_{23}a_{32}\\ \left.\frac{d}{dt} a_{13}(1+ta_{22})a_{31}\right|_{t=0} &= a_{13}a_{22}a_{31}\\ \end{align*} so I've (seemingly) calculated $$\left.\frac{d}{dt} \det(I_n+tA)\right|_{t=0}=\operatorname{tr}(A){\color{red}{-a_{12}a_{21}a_{33}-a_{11}a_{23}a_{32}-a_{13} a_{22}a_{31}}} $$ Does the red part evaluate to $0$? If not, where have I gone wrong? Any help is appreciated.

Remember that $tA$ means that all elements of $A$ gets multiplied by $t$, not just the diagonal. The terms you worry should be proportional to $t^2$ (which has zero derivative at zero). — Winther, Mar 06 '25 at 02:47
Aside: If you know what the Schur decomposition of a complex matrix is, this problem is easy to solve using it. Another approach is to prove the formula for diagonalizable matrices and observe that diagonalizable matrices are dense in the space of all matrices. — Deane, Mar 06 '25 at 04:19
Terence Tao, Matrix identities as derivatives of determinant identities, January 13, 2013. — Rodrigo de Azevedo, Mar 07 '25 at 14:31

crystal_math · Accepted Answer · 2025-03-06T04:31:49.553

If you look at the terms of the determinant of $B = I+tA$ on the RHS, below

$$\det(B)=\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma)\prod_{i=1}^n b_{i,\sigma(i)}$$

ask yourself, which products $\prod_{i=1}^n b_{i,\sigma(i)}$ contain an $t^2$ order or higher. If you consider the identity, i.e, $\sigma(i)=i$ (just the product of the diagonal), you would get something like

\begin{align*} (1+ta_{11})\cdots (1+ta_{nn}) = 1+\text{tr}(A)t+O(t^2) \end{align*}

This is where you'd get the trace from, because the $O(t^2)$ terms vanish after taking the derivative and setting $t=0$. Now, what is the contribution of the other permuations $\sigma$ (You hope to show that it is $O(t^2)$ so you know they contribute nothing to the derivative).

Suppose that $\sigma$ is not the identity and, then for some pair $(i,j)$, $\sigma(i) =j$. But now $\sigma(j) \not=j$ as well and so \begin{align*} b_{i\sigma(i)}&=ta_{i\sigma(i)}\\ b_{j\sigma(j)} &= ta_{j\sigma(j)} \end{align*}

so the product

\begin{align*} \prod_{i=1}^n b_{i,\sigma(i)} = O(t^2) \end{align*}

so contributes nothing to the derivative.

score 3 · Answer 2 · answered Mar 08 '25 at 18:13

Another approach - for every $t\neq 0$ $$\det(I_n+tA)=(-1)^nt^n\det\left(\frac{-1}{t}I_n-A\right)=(-1)^nt^nP_A(-\frac{1}{t})$$ where $P_A(t)=t^n-\operatorname{tr}(A)t^{n-1}+\ldots+a_0$ is the characteristic polynomial of $A$. Hence: $$\det(I_n+tA)=(-1)^nt^nP_A(-\frac{1}{t})=(-1)^n\cdot(-1)^n-(-1)^{n}\cdot(-1 )^{n-1}\operatorname{tr}(A)t+o(t^2)=1+\operatorname{tr}(A)t+o(t^2)$$ Taking the derivative w.r.t to $t$ at $0$ concludes the proof.

Mittens · Answer 3 · 2025-03-06T15:33:13.460

If one identifies the space real $n\times n$ matrices with $\mathbb{R}^{n^2}$ by vertically concatenating the column vectors of such matrices, we have that

Lemma: Let $\Delta:\mathbb{R}^{n^2}\longrightarrow\mathbb{R}$ be the determinant function, i.e. $$\Delta(\alpha_{11},\ldots,\alpha_{n1},\ldots,\alpha_{1n}, \ldots,\alpha_{nn})^{\top} = \det[(\alpha_{ij})]$$ where $(\alpha_{ij})$ is the $n\times n$--matrix whose $ij$--th component is $\alpha_{ij}$. Then, $$\Delta_\alpha= \frac{\partial \Delta}{\partial\alpha}= (W_{11}\ldots,W_{n1},\ldots,W_{1n},\ldots,W_{nn})$$ where $W_{ij}$ is the $ij$--th cofactor of the matrix $(\alpha_{ij})$.

The proof of this Lemma an simple exercise about computing determinants using the cofactor formula.

Consider the function $\phi(t)=\operatorname{det}(I+tA)=\Delta\circ g(t)$ where $g:\mathbb{R}\mapsto \mathbb{R}^{n^2}$ is given by $$t\mapsto[\mathbf{e}_1,\ldots,\mathbf{e}_n]+t(a_{11},\ldots,a_{n1},\ldots,a_{1n},\ldots,a_{nn})^\intercal$$ Notice that $g$ corresponds to the matrix $I+tA$, where for each $1\leq j\leq n$, $\mathbf{e}_j$ is the $j$-th (column) vector that has $j$-th component $1$ and zeroes everywhere else. An application of the cain rule yields $$\phi'(t)=\Delta'(g(t))g'(t)$$

In particular, at $t=0$, $$\Delta'(g(0))g'(0)=[\mathbf{e}_1,\ldots,\mathbf{e}_n]^\intercal\begin{pmatrix} a_{11} \\ \vdots\\ a_{n1}\\ \vdots\\ a_{1n}\\ \vdots\\ a_{nn} \end{pmatrix}=\operatorname{Trace}(A) $$

Thank you for your answer. A question: how do we get $\Delta'(g(0))=[\textbf{e}_1,\dots,\textbf{e}_n]$? It seems like this was used in the calculations. — Alann Rosas, Mar 06 '25 at 06:59
@AlannRosas: $g(0)=I_n$. The $ij$-th cofactor of $I_n$ is $\delta_{ij}$, that is, $1$ when $i=j$ and $0$ otherwise. — Mittens, Mar 06 '25 at 15:08
I see, thank you. Also, I noticed you edited your post to take the transpose of $[\textbf{e}_1, \dots, \textbf{e}_n]$ at the end — maybe you meant to do this in the definition of $g$ too? — Alann Rosas, Mar 06 '25 at 17:23
@AlannRosas: I agree the notation may be cumbersome; I used $[x_1,\ldots, x_n]$ to denote a column vector and $(x_1,\ldots, x_n)$ a row vector. Of course $[x_1,\ldots, x_n]^\intercal=(x_1,\ldots,x_n)$. Also, $\mathbf{e}_j$, the $j$-th vector in the standard ordered basis of $\mathbb{R}^n$ is consider as a column vector. $g'(0)$ is a column vector formed by "concatenating" into one column all the columns of matrix $A$. — Mittens, Mar 06 '25 at 18:19

Trouble proving that for any $A\in M (n,\mathbb R)$, we have $\left.\frac{d}{dt} \det (I_n+tA)\right|_{t=0}=\operatorname{tr}(A)$.

3 Answers3