Is there a general procedure to take the derivative of an analytic function of a matrix $f(\mathbf{X}) : \mathbb{C}^{n\times n} \rightarrow \mathbb{C}^{n\times n}$ (not the element-wise application of a function, but the matrix function defined with the Taylor series or other equivalent means), with respect to each element of the matrix argument?
-
Not really. Other than expanding $f(X+H)-f(X)$ and looking for linear terms. – copper.hat Mar 26 '21 at 17:53
-
Since the output is also a matrix, I would write $\mathbf{F} (\mathbf{X})$. – Rodrigo de Azevedo Mar 26 '21 at 21:14
-
The usual convention is to use the lowercase – Undead Mar 26 '21 at 21:15
-
Lowercase for scalar fields. – Rodrigo de Azevedo Mar 26 '21 at 21:17
-
@RodrigodeAzevedo It is conventional to use lowercase here since this is the extension of a scalar function to square matrix inputs (or more generally, linear operators in functional calculus). The wikipedia article linked in the OP is replete with references to $f(A)$, and this seems entirely reasonable to me. – Erick Wong May 26 '21 at 01:11
2 Answers
To illustrate what can happen, suppose we take the very innocent looking function $$ F(X)=X^n, $$ where $n$ is a positive integer. Then $$ F'(X)(H) = \frac d{dt}\Big|_{t = 0}(X+tH)^n = $$$$ = HX^{n-1} + XHX^{n-2} + X^2HX^{n-3} + \cdots + X^{n-2}HX + X^{n-1}H = $$$$ = \sum_{k=1}^{n-1} X^kHX^{n-1-k}. $$
In other words, the fact that matrix multiplication is non-commutative substantially complicates things. On the other hand, when we are differentiating a function of the form $$ F(X) = \text{tr}(f(X)), $$ things work much better since the trace makes up for the lack of commutativity. See Derivative of trace function.
- 20,073
$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\e{\varepsilon}\def\R#1{\in{\mathbb R}^{#1}}$Coordinate-wise derivatives are a useful approach which avoids higher-order tensors or transformations (i.e. vectorization) which flatten those tensors into matrices.
First, given the matrix variable $X\R{n\times n}\,$ its coordinate-wise derivatives are $$\eqalign{ \p{X}{X_{ij}} &= e_i e_j^T \;\doteq\; E_{ij} \\ }$$ where $e_i$ is a cartesian basis vector and $E_{ij}$ is the single-entry matrix.
Second, given a function defined by the Taylor series
$$F = \sum_{k=0}^\infty \alpha_k X^k$$
then, assuming the series converges for the given $X,\,$ its coordinate-wise derivatives are
$$\p{F}{X_{ij}} = \sum_{k=1}^\infty \alpha_k \left(\sum_{\ell=1}^{k} X^{k-\ell}E_{ij}X^{\ell-1}\right)$$
- 40,033