Matrix function derivative with respect to matrix elements

Question

Is there a general procedure to take the derivative of an analytic function of a matrix $f(\mathbf{X}) : \mathbb{C}^{n\times n} \rightarrow \mathbb{C}^{n\times n}$ (not the element-wise application of a function, but the matrix function defined with the Taylor series or other equivalent means), with respect to each element of the matrix argument?

Not really. Other than expanding $f(X+H)-f(X)$ and looking for linear terms. — copper.hat, Mar 26 '21 at 17:53
Since the output is also a matrix, I would write $\mathbf{F} (\mathbf{X})$. — Rodrigo de Azevedo, Mar 26 '21 at 21:14
@RodrigodeAzevedo It is conventional to use lowercase here since this is the extension of a scalar function to square matrix inputs (or more generally, linear operators in functional calculus). The wikipedia article linked in the OP is replete with references to $f(A)$, and this seems entirely reasonable to me. — Erick Wong, May 26 '21 at 01:11

score 5 · Accepted Answer · answered Mar 26 '21 at 22:18

To illustrate what can happen, suppose we take the very innocent looking function $$ F(X)=X^n, $$ where $n$ is a positive integer. Then $$ F'(X)(H) = \frac d{dt}\Big|_{t = 0}(X+tH)^n = $$$$ = HX^{n-1} + XHX^{n-2} + X^2HX^{n-3} + \cdots + X^{n-2}HX + X^{n-1}H = $$$$ = \sum_{k=1}^{n-1} X^kHX^{n-1-k}. $$

In other words, the fact that matrix multiplication is non-commutative substantially complicates things. On the other hand, when we are differentiating a function of the form $$ F(X) = \text{tr}(f(X)), $$ things work much better since the trace makes up for the lack of commutativity. See Derivative of trace function.

score 4 · Answer 2 · answered Mar 27 '21 at 01:41

$\def\p#1#2{\frac{\partial #1}{\partial #2}}\def\e{\varepsilon}\def\R#1{\in{\mathbb R}^{#1}}$Coordinate-wise derivatives are a useful approach which avoids higher-order tensors or transformations (i.e. vectorization) which flatten those tensors into matrices.

First, given the matrix variable $X\R{n\times n}\,$ its coordinate-wise derivatives are $$\eqalign{ \p{X}{X_{ij}} &= e_i e_j^T \;\doteq\; E_{ij} \\ }$$ where $e_i$ is a cartesian basis vector and $E_{ij}$ is the single-entry matrix.

Second, given a function defined by the Taylor series $$F = \sum_{k=0}^\infty \alpha_k X^k$$ then, assuming the series converges for the given $X,\,$ its coordinate-wise derivatives are
$$\p{F}{X_{ij}} = \sum_{k=1}^\infty \alpha_k \left(\sum_{\ell=1}^{k} X^{k-\ell}E_{ij}X^{\ell-1}\right)$$

Matrix function derivative with respect to matrix elements

2 Answers2

Linked