Fréchet derivative of a matrix expression

Question

Suppose $h(Q) = Q^{T} A Q$, then the Fréchet derivative is given by $D_{h} (Q) [H] = H^{T} A Q + Q^{T} A H$. I am bit unsure about this so-called Fréchet derivative is obtained.

I would have just said:

$(h(Q))'= \dot{Q^{T}}AQ + Q^{T}\dot{A}Q + Q^{T} A \dot{Q}$, and then try to find an expression, but I haven't been able to come up with one.

Any help is appreciated.

Why is it no $H$ in $\dot Q^TAQ$ but $H$ in the next two terms? — Arctic Char, Mar 22 '23 at 14:46
@ArcticChar my own Idea, but I am also a bit puzzled by this H. Seems like it means that in the book I am reading they are differentiating Q along H — user21369645, Mar 22 '23 at 20:12
The most basic example is to find the derivative of the inverse matrix. This is simply apply the product rule to $XX^{-1}=I$. and then we end up with $\dot{X}^{-1} = - X^{-1} \dot{X} X^{-1}$. but the book I am following they derive this in a bit more formal manner, which confuses me. They say that: $F(X)X=I \implies DF(X)[U]X + F(X)U=0 \implies DF(X)[U]=-X^{-1}UX^{-1}$, which I suppose is the same thing that is confusing me in my question — user21369645, Mar 23 '23 at 11:58
@user What you seem to not be understanding is that by definition, the derivative of $h$ at $Q$, i.e. $Dh(Q)$, is itself a function. In this case, it is a linear map between the domain and codomain (which you do not explicitly specify in your question). You might find this post to be helpful. — Ben Grossmann, Mar 23 '23 at 13:58
@user21369645 What exactly do you have in mind with $\dot{Q}$? The derivative described by the Frechet derivative formula is a derivative with respect to a matrix input, but your result in terms of $\dot{Q}$ or the result with $\dot{X}$ in your comment only makes sense with an interpretation $\dot{X} = \frac{dX}{dt}$ for some parameter $t \in \Bbb R$ — Ben Grossmann, Mar 23 '23 at 14:28
This just follows from the definition $D_h(Q)[H]=\lim_{\epsilon\to0}(h(Q+\epsilon H)-h(Q))/\epsilon$. — KBS, Mar 23 '23 at 14:57
The definition of the derivatives which are available all over the Internet. Did you look at the Wikipedia page before asking? — KBS, Mar 23 '23 at 15:00
@KBS , This is my first time taking the derivative of a matrix function, which caused me some confusion. I don't see any reason to be hostile because you know something I don't. — user21369645, Mar 23 '23 at 15:03
@KBS That's uses the definition of the Gateaux derivative, not the Frechet derivative. It can be shown that this result is indeed equal to the Frechet derivative (whenever a function is Frechet differentiable), but as far as I have seen this is not how the computation of the Frechet derivative is ever presented — Ben Grossmann, Mar 23 '23 at 15:04
@BenGrossmann This is true, but they coincide for this function — KBS, Mar 23 '23 at 16:08
It's worth noting that this approach does generally work: for example if $g(a, b, c)$ is a multilinear function then $$D[g(a(Q),b(Q), c(Q))][H] = g(Da(Q)[H], b(Q), c(Q)) + g(a(Q), Db(Q)[H], c(Q)) + g(a(Q), b(Q), Dc(Q)[H]).$$ On the LHS we're taking the derivative of the function $Q \mapsto g(a(Q), b(Q), c(Q))$. In your case $g(a, b, c) = a^Tbc$ and $h(Q) = g(Q, A, Q)$ so that $Da(Q) = DA(Q) = 0$ because $A$ is a constant. The obvious generalization works for multilinear functions of any arity. — Nicholas Todoroff, Mar 23 '23 at 17:09
@KBS Not only for this function. My main point here, however, is that it is deceptive to say that "this just follows from the definition" and then use something that is not "the definition" — Ben Grossmann, Mar 23 '23 at 18:00

score 2 · Accepted Answer · answered Mar 23 '23 at 15:26

2

As I've stated in my comments, I think your primary confusion is over the "data type" of the Frechet derivative. I've addressed this concern in the comments on your question. Here, I will talk primarily about how one computes the Frechet derivative.

There are several equivalent definitions of the Frechet derivative that can be used, but I prefer the following. For normed vector spaces $U,V$ and a function $f:U \to V$, the Frechet derivative of $f$ at the point $x \in U$ (written as $Df(x)$) is the unique linear map $Df(x) = A:U \to V$ for which $$ f(x + h) = f(x) + A(h) + o(\|h\|) $$ for $H \in U$. Here, $o(\|h\|)$ (which is little-o notation) denotes a sum of "higher order terms". In other words, at any given $x \in U$, the remainder $R(h) = f(x+h) - f(x) - A(h)$ satisfies $$ \lim_{h \to 0}\frac{R(h)}{\|h\|} = 0. $$ Note that $A(h)$ can be written as $[Df(x)](h)$ or, as you have writen it, in the form $Df(x)[h]$.

With that in mind, you seem to have a function $h:\Bbb R^{n \times m} \to \Bbb R^{n \times n}$ of the form $$ h(Q) = Q^TAQ, $$ for some $m \times m$ matrix $A$. At any given $Q$, $Dh(Q)$ will be a linear map that takes an $n \times m$ input and produces an $n \times n$ output. That is, for any $n \times m$ matrix $H$, $Dh(Q)[H]$ will be an $n \times n$ matrix.

My preferred approach to finding this linear map is to "expand" the expression $h(Q + H)$. In this case, this simply amounts to expanding a product of matrices; more generally, we might use something akin to a "linearization" or "Taylor series" expansion. We have \begin{align} h(Q + H) &= (Q + H)^TA(Q + H) \\ & = \underbrace{Q^TAQ}_{h(Q)} + H^TAQ + Q^TAH + \underbrace{H^TAH}_{o(\|H\|)}. \end{align} The initial term of the sum is equal to $h(Q)$. The final term of the sum involves a multiplication of two $H$'s, which means that its norm $\|H^TAH\|$ scales with $\|H\|^2$, which means that as $H \to 0$, this term approaches zero faster than $\|H\|$. The piece that remains is $H^TAQ + Q^TAH$, which depends linearly on $H$. That is, $\mathcal L(H) = H^TAQ + Q^TAH$ is a linear map. With this linear map, we have $$ h(Q + H) = h(Q) + \mathcal L(H) + o(\|H\|). $$ So, by definition, $\mathcal L = Dh(Q)$. That is, we have $$ Dh(Q)[H] = H^TAQ + Q^TAH. $$

answered Mar 23 '23 at 15:26

Ben Grossmann

234,171
12
184
355

Thank you so much for taking your time to make such a thorough answer. Much appreciated! I will go through it slowly and try to digest it. If you have used any references for this question, may I ask which one? – user21369645 Mar 23 '23 at 15:41
Glad to help. Another post that comes to mind: in this post, I used the Frechet derivative to apply the chain rule in the context of back-propogation (for neural-nets) – Ben Grossmann Mar 23 '23 at 15:45
@user21369645 I don't have a reference for you off the top of my head – Ben Grossmann Mar 23 '23 at 15:51
Related to this question there's an identity which relates the jacobian with the vectorization and Kroenecker product. Do you mind if I ask it in this chat? – user21369645 Mar 27 '23 at 06:16
@user21369645 I don’t know what you’re referring to so it’s likely that I don’t know the answer. I recommend that you post a new question and then put a link to that question in another comment on my answer. – Ben Grossmann Mar 27 '23 at 11:40
Having obtained the derivative, they state that the Jacobian $J_h$ is $J_h(vec Q) = ((AQ)^{T} \oplus I)\Pi + I \oplus Q^{T} A$, where $\Pi$ is the permutation matrix $\Pi^{T}=\Pi$ s.t. $\Pi vec(X) = vec(X^{T}) \ \forall X$. They stress the importance of the identity $vec(AXB)=B^{T} \oplus A vec(X)$. I have looked into this: https://math.stackexchange.com/questions/1829956/differentiation-w-r-t-the-mboxvec-operator – user21369645 Mar 27 '23 at 12:08
@user21369645 ok, now I know what you’re referring to. I still recommend that you post a new question. – Ben Grossmann Mar 27 '23 at 12:26
I think I figured it out, it follows almost immediately from one of the many identities regarding kroenecker and the vectorization operator. Thanks a lot for your time though! If I get stuck I will post a fully question later. I will try to work out the details at first. – user21369645 Mar 27 '23 at 12:28
https://math.stackexchange.com/questions/4667862/jacobian-and-vectorization – user21369645 Mar 28 '23 at 06:32

Fréchet derivative of a matrix expression

1 Answers1