2

Is there a general procedure to take the derivative of an analytic function of a matrix $f(\mathbf{X}) : \mathbb{C}^{n\times n} \rightarrow \mathbb{C}^{n\times n}$ (not the element-wise application of a function, but the matrix function defined with the Taylor series or other equivalent means), with respect to each element of the matrix argument?

Undead
  • 175

2 Answers2

5

To illustrate what can happen, suppose we take the very innocent looking function $$ F(X)=X^n, $$ where $n$ is a positive integer. Then $$ F'(X)(H) = \frac d{dt}\Big|_{t = 0}(X+tH)^n = $$$$ = HX^{n-1} + XHX^{n-2} + X^2HX^{n-3} + \cdots + X^{n-2}HX + X^{n-1}H = $$$$ = \sum_{k=1}^{n-1} X^kHX^{n-1-k}. $$

In other words, the fact that matrix multiplication is non-commutative substantially complicates things. On the other hand, when we are differentiating a function of the form $$ F(X) = \text{tr}(f(X)), $$ things work much better since the trace makes up for the lack of commutativity. See Derivative of trace function.

Ruy
  • 20,073
4

$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\e{\varepsilon}\def\R#1{\in{\mathbb R}^{#1}}$Coordinate-wise derivatives are a useful approach which avoids higher-order tensors or transformations (i.e. vectorization) which flatten those tensors into matrices.

First, given the matrix variable $X\R{n\times n}\,$ its coordinate-wise derivatives are $$\eqalign{ \p{X}{X_{ij}} &= e_i e_j^T \;\doteq\; E_{ij} \\ }$$ where $e_i$ is a cartesian basis vector and $E_{ij}$ is the single-entry matrix.

Second, given a function defined by the Taylor series $$F = \sum_{k=0}^\infty \alpha_k X^k$$ then, assuming the series converges for the given $X,\,$ its coordinate-wise derivatives are
$$\p{F}{X_{ij}} = \sum_{k=1}^\infty \alpha_k \left(\sum_{\ell=1}^{k} X^{k-\ell}E_{ij}X^{\ell-1}\right)$$

greg
  • 40,033