Make sense of taking 'differential' on both sides of a matrix equation

Question

This is related to a question I posted. Here is the statement for the question

In control theory, the discrete Lyapunov equation is defined as \begin{align*} A^T X A + Q = X, \end{align*} where $A \in \mathcal{M}(n \times n; \mathbb R)$ and $Q \in \mathbb {S}_{++}$ ( positive definite matrices). There is a theorem stating if the spectral radius of $A$ satisfies $\rho(A) < 1$ and for fixed $Q > 0$, there exists a unique $X \in \mathbb {S}_{++}$ which solves above equation. Let $D = \{A \in \mathcal{M}(n \times n; \mathbb R): \rho(A) < 1\}$ and fix $Q$. Suppose we define some scalar valued function $f$ over $X$ which are solutions of Lyapunov equation over $D$. To make it more concrete, let us define this scalar valued function to be $f(X) = \text{tr}(X)$. This function can be also viewed as a function $g$ over $D$, i.e., it is a composition \begin{align*} g \colon A \xrightarrow{h} X \xrightarrow{f} \text{tr}(X). \end{align*} Now I would like to differentiate $g$ with respect to $A$.

There are some very good answers to this question. The answers posted by @greg and @lynn are very interesting. But in the answers, they kind of freely take 'differential' of both sides and applying product rule

$$dA^T X A + A^T d X A + A^T X dA = dX.$$

I am a little uncomfortable with using the symbols $dX, dA$ before assigning some mathematical meaning. I know we can intuitively think them as infinitesimal change in the entries. But I would like to know some rigorous way to understand it. The only place I know they have a meaning is in differential geometry, i.e., differential forms. In this situation, how do we make sense of this step?

score 3 · Accepted Answer · 2018-05-06T21:44:22.937

The considered derivative is $D(f\circ h)_A:H\in M_n\rightarrow Df_X(Dh_A(H))$, which is a linear function. Then it remains to calculate the linear function $Dh_A(H)$ where $Dh_A:H\in M_n\rightarrow Dh_A(H)\in S_n$.

The question is: is $h(A)=X$ derivable on $D$ ? If yes, then we denote the linear function $Dh_A$ by $X'$; then we have necessarily the relation:

(cf. https://en.wikipedia.org/wiki/Kronecker_product)

$(*)$ $H^TXA+A^TXH+A^TX'(H)A=X'(H)$, that is $(I\otimes I-A^T\otimes A^T)X'(H)=H^TXA+A^TXH$, where the matrices are stacked row by row. Let $spectrum(A)=(\lambda_i)$ with $|\lambda_i|<1$; then $spectrum(I\otimes I-A^T\otimes A^T)=((1-\lambda_i\lambda_j)_{i,j})$, a set of $n^2$ non-zero entries. Then $I\otimes I-A^T\otimes A^T:S_n\rightarrow S_n$ is invertible on $M_n$ and also on $S_n$. Then $X'(H)\in S_n$ is uniquely defined as solution of $(*)$ linear equation, knowing $X,H$.

EDIT. I just see your question dated May 3; I do not see any difference with this one. My answer is pretty much the same ones that were provided to you, few days ago. The only difference is that I use derivatives as linear applications and not as infinitesimal change. If you do not want to make the effort to understand Kronecker product or the notion of derivative, then you should do something other than mathematics.

It is not the Kronecker part I am confused of but I don't want to use "infinitesimal change". Nevertheless, your answer is great. It clarifies a lot of stuff for me, thanks. — user1101010, May 07 '18 at 04:05

Make sense of taking 'differential' on both sides of a matrix equation

1 Answers1