Matrix function derivative. Introduction

Question

The author of this question was close to determining the derivative of the function of dual variable, when we consider matrices isomorphic (algebraically and topologically) to dual numbers: $$(a+\epsilon b) \sim \begin{bmatrix} a & 0 \\ b & a \\ \end{bmatrix}.$$

So, using the fact we can define the derivative (in the Fréchet sense) for functions $F$ for with an argument in the form of such a matrix and a value in the form of such a matrix: $$F\big(\begin{bmatrix} x+s & 0 \\ y+t & x+s \\ \end{bmatrix}\big)-F\big(\begin{bmatrix} x & 0 \\ y & x \\ \end{bmatrix}\big)=\begin{bmatrix} u' & 0 \\ v' & u' \\ \end{bmatrix}\begin{bmatrix} s & 0 \\ t & s \\ \end{bmatrix}+o\bigg(\bigg|\bigg|\begin{bmatrix} s & 0 \\ t & s \\ \end{bmatrix}\bigg|\bigg|\bigg),$$ where $\bigg|\bigg|\begin{bmatrix} s & 0 \\ t & s \\ \end{bmatrix}\bigg|\bigg|=\max\{|s|,|t|\}$ and all elements of all matrices are real.

Therefore, the existence of such a matrix $\begin{bmatrix} u' & 0 \\ v' & u' \\ \end{bmatrix}$ (which we will call derivative at $\begin{bmatrix} x & 0 \\ y & x \\ \end{bmatrix}$) means differentiability of $F$ at $\begin{bmatrix} x & 0 \\ y & x \\ \end{bmatrix}$.

I'm interested in to what extent can this approach be generalized in defining a matrix-valued function of a matrix argument? I mean the case, when the derivative is an object of the same nature as variables (in opposed to the definition of the derivative of a function $f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ which is a (Jacobian) matrix).

Can anyone share links to material with respect to such kind of derivatives?

A recent paper by Jan Magnus et al $,$makes extensive use of triangular block matrices to analyze and evaluate the derivatives of matrix functions. This is quite analogous to the ideas in your post. — greg, Apr 01 '24 at 13:36
@Having quick look through the article, I didn’t notice anything similar there. Although, this is probably just my lack of knowledge in the field of tensors. In any case, thanks for the material, I'll try to figure it out — Иван Петров, Apr 01 '24 at 14:46
Magnus's triangular matrices have a direct connection to dual numbers via $${ {\tt1}\sim\pmatrix{{\tt1}&0\0&{\tt1}} \qquad \epsilon\sim\pmatrix{0&{\tt1}\0&0} }$$ No knowledge of tensors is required. — greg, Apr 01 '24 at 15:17
@greg But his derivatives are still not matrices of the same size, I'm interested exactly in such cases — Иван Петров, Apr 01 '24 at 15:50

Jos Bergervoet · Answer 1 · 2024-03-30T09:35:05.143

2

That will quickly go wrong. One step more complex than your example is the case of complex valued functions of complex numbers. The matrix equivalent to those number is then: $$(a+i b) \sim \begin{bmatrix} a & b \\ -b & a \\ \end{bmatrix}.$$

And for the function we know that it needs not one, but two derivatives: $$F\big(\begin{bmatrix} x+s & y+t \\ -y-t & x+s \\ \end{bmatrix}\big)-F\big(\begin{bmatrix} x & y \\ -y & x \\ \end{bmatrix}\big)=\qquad\qquad\qquad \\[20pt] \begin{bmatrix} c & d \\ -d & c \\ \end{bmatrix}\begin{bmatrix} s & t \\ -t & s \\ \end{bmatrix} +\begin{bmatrix} u & v \\ -v & u \\ \end{bmatrix}\begin{bmatrix} s & -t \\ t & s \\ \end{bmatrix}+o\bigg(\bigg|\bigg|\begin{bmatrix} s & t \\ -t & s \\ \end{bmatrix}\bigg|\bigg|\bigg),$$ which we usually write as: $$ f(z+\Delta) - f(z) = c \ \Delta + u\ \Delta^* + o(|\Delta|)\\[15pt] \text{or:}\quad f(z+\Delta) - f(z) = \frac{df}{dz} \ \Delta + \frac{df}{dz^*}\ \Delta^* + o(|\Delta|)\\ $$.

As example take the following functions and their pair of derivatives: $$ \begin{matrix} f(z) & & df/dz & & df/dz^* \\[10pt] {\rm Re}(z) && \frac12 && \frac12 \\ {\rm Im}(z) && -\frac12\ i && \frac12\ i \\ z && 1 && 0 \\ z^2 && 2\,z && 0 \\ z^* && 0 && 1 \\ |z|^2 && z^* && z \\ |z| && \frac12 \frac{\Large z^*}{\Large |z|} && \frac12 \frac{\Large z}{\Large |z|} \\ |z|^3 && \frac32 |z| z^* && \frac32\, |z|\, z \\ && {\rm etc.} && \end{matrix}.$$

As can be seen, only functions that are analytical, like $z$, or $z^2$, have $df/dz^*=0$ so they need only $df/dz$ to describe their derivative (which we then call "the complex derivative"). Likewize, purely anti-analytical functions, like $z^*$, or $(z^*)^2$, need only $df/dz^*$. In general, however, two complex numbers are needed, or in matrix language: two matrices are needed to describe the first order variation for these matrix-valued functions of a matrix. (See also question 2126598.)

For more complex (larger) matrices the number further increase, there simply is much more information required than can be contained in one matrix to describe the derivative in those cases.

edited Mar 30 '24 at 09:35

answered Mar 29 '24 at 22:43

Jos Bergervoet

594
3
8

Jos Bergervoet good example. But if I define the derivative of such matrix functions as I described above for case of dual nubmer, it just means the function $F$ satisfies Cauchy–Riemann conditions (that is, the function is holomorphic, which is equivalent to analyticity for the case of complex numbers).
Therefore, it is clear that such derivatives can only be described by limited classes of functions. And so my question is: is there a well-described theory regarding such classes of matrices and functions of them?
– Иван Петров Mar 30 '24 at 09:15
Indeed @Иван (perhaps we should look at the quaternions as next example and follow Cayley–Dickson from there, hopefully it will all be clear once we reach the Sedenions!) But what I was thinking of is actually the opposite: allowing all functions, and then accepting that the derivative must be split into a sum with a minimum required number of terms, as the "rank" in an entangled state of two systems in QM. (And we know that the well-described theory of entanglement is very interesting, maybe this could be related.) – Jos Bergervoet Mar 30 '24 at 09:29
1

I look at it from a different perspective @Jos. For functions $f:\mathbb{R}^n \to \mathbb{R}^m$ the derivative at a point $a \in \mathbb{R}^n$ is the linear transformation $D_a \in L(\mathbb R^n,\mathbb R^m)$. In its turn, we have $L(\mathbb R^n,\mathbb R^m) \approx M_{m,n}(\mathbb R)$, generally speaking.
So, I am interested in cases where for a certain class of matrices $M$ and functions $F:M\rightarrow M$ we would have an isomorphism $L(M,M)\approx M.$ Simply put, I'm interested in a general view of this question.
– Иван Петров Mar 30 '24 at 10:23
So at least things like zero-divisors have to be excluded. If you look at nice examples like O(n) or SU(n) where the matrices form a group, it might work, but even then we can define non-differentiable functions (we can even do that for the reals). So your question is an existance question, for functions that have (at least in some region of their domain) a single first-derivative description... (and still are non-constant of course). – Jos Bergervoet Mar 30 '24 at 10:42
Dual numbers have zero divisors, but this does not prevent us from defining the derivative of functions of this numbers – Иван Петров Mar 30 '24 at 11:02
But I mean you would have to somehow exclude them from the class of functions, e.g. the function $$F\big(\begin{bmatrix} x & 0 \ y & x \ \end{bmatrix}\big)=\begin{bmatrix} 0 & 0 \ y & 0 \ \end{bmatrix} $$ woud not allow you to describe the linear variation with one dual number as proportionality constant (similar to $f(z)={\rm Im}z$ for complex numbers). But I agree this does not mean they have to be excluded from the class of matrices $M$. – Jos Bergervoet Mar 30 '24 at 11:47
You're right, the function is not differentiable, but this is a normal suituation, some functions are differentiable, some are not. I don't quite understand the point of this comment. – Иван Петров Mar 30 '24 at 12:17
1

It is about what the formulation of the existance question should be. There always exist non-constant functions that are differentiable (like $f(x)=x$), and functions that are not. So what more do we want to prove? Your definition "the existence of such a matrix ... means differentiability of $F$" seems to answer the question whether a function qualifies. Isn't that enough? In what sense would that approach have to be generalized? – Jos Bergervoet Mar 30 '24 at 13:55
I apologize, perhaps I did not describe my question well enough at the beginning of the topic.
In these examples that you and I gave, everything is quite obvious, there are isomorphisms to complex or dual numbers.

But what if we consider matrices of arbitrary size, even commutative ones, but NOT isomorphic to any generalizations of complex numbers (quaternions, double numbers, and so on)?
– Иван Петров Mar 30 '24 at 14:37
Are such objects described in any articles or educational materials? That was my original question.
Everything that I found on the network regarding matrix derivatives one way or another comes down to vectorization of matrices and again consideration of functions of the form $f:\mathbb{R}^n \to \mathbb{R}^m$ the derivative of which is again an object of a different nature
– Иван Петров Mar 30 '24 at 14:38
If in the general case no one has considered this, when the derivative of a matrix function is again a matrix, and of the same size, then my question is closed. – Иван Петров Mar 30 '24 at 14:41

greg · Answer 2 · 2024-05-04T15:18:28.257

Let $(I,N)$ be the $2\times 2$ identity and nilpotent matrices, i.e. $\,N^2=0$.

Given two scalars $(x,y)$ construct the dual matrix and its differential
NB: $\,$ I use $(dx,dy)$ instead of $(s,t)$ $$\eqalign{ A &= xI+yN \\ dA &= dx\,I+dy\,N \\ A+dA &= (x+dx)\,I + (y+dy)\,N \\ }$$

Then apply the function and its derivative to the $A$ matrix $$\eqalign{ \def\bR#1{\Big(#1\Big)} \def\BR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} F&\doteq \;f(A) = \;f\big(xI+yN\big) \;=\; \:f(x)\,I + yf'(x)\,N \\ F'&= f'(A) = f'\big(xI+yN\big) \;=\; f'(x)\,I + yf''(x)\,N \\ }$$ Now consider differentials and gradients of $F$ wrt the parameters $(x,y)$ $$\eqalign{ \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} dF &= f(A+dA) \;-\; f(A) \\ &= \BR{f(x+dx)-f(x)}\,I \;+\; \BR{(y+dy)\,f'(x+dx)-yf'(x)}\,N \\ &= \BR{f'(x)\,dx}\,I \;+\; \BR{yf''(x)\,dx + f'(x)\,dy}\,N \\ &= f'(A)\,dx \;+\; f'(x)\,N\,dy \\ \grad{F}{x} &= f'(A), \qquad\;\; f'(x)\,N = \grad{F}{y} \\ \\ }$$

Your claim is that the differential can be written using a simple matrix product $$\eqalign{ \def\tm{\times} \def\G{{\Gamma}} dF &= G\:dA \\ }$$ where $G$ is a matrix-valued gradient.

But the gradient $\,\G=\large\grad FA\,$ is known to be a fourth-order tensor $\LR{\G\in{\mathbb R}^{2\tm 2\tm 2\tm 2}}$
and a Double-Dot Product $(:)$ is required to calculate the differential $$\eqalign{ dF &= \G:dA \quad\iff\quad \G = \grad FA \quad\iff\quad \G_{ijkl} = \grad{F_{ij}}{A_{kl}} \\ }$$ This tensor-valued gradient can be related to the parametric gradients derived above by substituting $\,dA=\LR{I\,dx+N\,dy}$ $$\eqalign{ dF &= \G:I\,dx \;+\; \G:N\,dy \\ \grad Fx &= \G:I,\qquad\;\; \G:N=\grad Fy \\ }$$ Furthermore, since $N\,{\rm and}\,I$ are orthogonal under the $(:)$ product, this leads us to a simple expression for $\G$ $$\eqalign{ \def\s{\star} \G &= \frac12f'(A)\star I \;+\; f'(x)\,N\star N \\ }$$ where $(\s)$ is the dyadic (or tensor) product.

In terms of components, dyadic products are calculated as $$\eqalign{ \def\l{\ell} \def\R{{\large\Omega}} \R = P\s a\s b \quad\iff\quad \R_{ijk\l} = P_{ij}\, a_k\, b_\l \\ \\ }$$

$^{**}$An interesting use of dual numbers is to calculate the derivative of a $\sf real$ function $$\def\e{\varepsilon}f(x+\e) = f(x) + \e f'(x)$$

I'm sorry, but I have little understanding of tensors. Please tell me how to understand the dyadic product here? Is it the same as for vectors? — Иван Петров, Apr 01 '24 at 10:20
You can visualize a fourth-order tensor as a matrix with matrix elements, if that helps. I also appended component-wise examples of the dyadic product to my post. — greg, Apr 01 '24 at 13:29
but as I understand from above, the "matrix" $Г$ has size greater than 2×2. Am I right? — Иван Петров, Apr 01 '24 at 14:31
@and also, you have already used knowledge about common representation of functions of dual variable, but my goal is to study the properties of this kind of functions from the point of view of matrix differentiation, and not vice versa — Иван Петров, Apr 01 '24 at 14:39

score 0 · Answer 3 · answered Apr 02 '24 at 19:09

$\def\tsr#1#2#3{{{#1}^{#2}}_{#3}}$ $\def\prt{\partial}$ $\def\pdv#1#2{\frac{\prt #1}{\prt #2}}$

You are trying to show that $d{Y^j}_i = {D^k}_i d{X^j}_k$ where $D=\pdv{Y}{X}$; but in fact the only linear transformation that is consistent with the definition of the derivative is $d{Y^j}_i = {D^{jk}}_{il} d{X^l}_k$. A vector function $y(x)$ where $\def\R{\mathbb{R}}y \in \R^m$ and $x \in \R^n$ has a $m \times n$ Jacobian since we need to determine $\pdv{y_i}{ x_j}$ for all combinations of $i,j$. In the same way the derivative of $Y \in \R^{n \times m}$ w.r.t. $X \in \R^{k \times l}$ has to be a fourth-order tensor $D$ since we need to determine $D=\pdv{\tsr{Y}{j}{i}}{\tsr{X}{q}{r}}$ for all combinations of $i,j,q,r$. Using $d{Y^j}_i = {D^k}_i d{X^j}_k$ we can only get $\pdv{\tsr{Y}{j}{i}}{\tsr{X}{j}{k}}$ so $D$ is underdetermined.

You are describing a general situation, but as examples given by me and Jos, in some cases the differential can be represented in this form (as I want). And I wonder if such situations have been investigated in a more general way than these two simple examples. — Иван Петров, Apr 02 '24 at 20:07

Matrix function derivative. Introduction

3 Answers3