Calculate the gradient of a linear scalar field

Question

I am trying to calculate the following gradient

$$\nabla_{\mathbf{X}} \left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right)$$

where I am using the convention that $\mathbf{a}$ is a column vector. I am wondering what the steps are to extract the solution from the matrix cookbook, which is:

$$\nabla_{\mathbf{X}} \left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right) = \mathbf{a}\cdot\mathbf{a}^{T}$$

@mathcounterexamples.net See https://en.wikipedia.org/wiki/Matrix_calculus and https://math.stackexchange.com/questions/2807864/derivative-of-the-trace-of-the-product-of-a-matrix-and-its-transpose/2809102#2809102 — Jean-Claude Arbaut, Sep 08 '20 at 07:43
@Jean-ClaudeArbaut thanks for the heads up! But then how do you use $\frac{\partial y}{\partial X}$ to estimate $\Delta y$ for a $\Delta X$? This won't be done with usual matrix operations? While it is trivial in our case that $\Delta y = a^T \cdot \Delta X \cdot a$. — mathcounterexamples.net, Sep 08 '20 at 07:50
@mathcounterexamples.net Read all of https://math.stackexchange.com/questions/2807864/derivative-of-the-trace-of-the-product-of-a-matrix-and-its-transpose/2809102#2809102 You just have to compute $\Delta y=\mathrm{tr}(A^T\Delta X)$, where $A=dy/dX$ and $\mathrm{tr}$ is the trace. — Jean-Claude Arbaut, Sep 08 '20 at 07:53
@RodrigodeAzevedo I am trying to understand he reason for this derivation actually! — Jose Ramon, Sep 08 '20 at 09:19
@RodrigodeAzevedo the details the swapping but I think I am getting close from the rest answers. — Jose Ramon, Sep 08 '20 at 09:23
@JoseRamon Which swapping? Do you mean the cyclic property of the trace? — Rodrigo de Azevedo, Sep 08 '20 at 09:25
@RodrigodeAzevedo Yes exactly why is not $y^{T}x$ and it is $yx^{T}$. Can you elaborate more about the connection of the cyclic property of the trace with the result? — Jose Ramon, Sep 08 '20 at 09:36
@JoseRamon Because of the Frobenius inner product. Take a look at this. — Rodrigo de Azevedo, Sep 08 '20 at 09:42
Yes so it is another way of answering, but I find @Jean-ClaudeArbaut answer more intuitive. — Jose Ramon, Sep 08 '20 at 10:39
One more question if I have a MxN matrix lets say A, then the derivative $\partial (cA)/ \partial A$ where c is a scalar, is $M \times N$ c, right? Or $N \times M$ — Jose Ramon, Sep 08 '20 at 10:44
@JoseRamon Be careful. $A \mapsto c A$ is no longer a scalar field. The output is a matrix, not a real number. The derivative is a $4$-dimensional matrix. — Rodrigo de Azevedo, Sep 08 '20 at 10:47
So the derivation of a 2d matrix with a 2d matrix gives a 4d matrix as a result? — Jose Ramon, Sep 08 '20 at 11:00
@JoseRamon Yes, because you're differentiating each entry of the output matrix with respect to the input matrix. You can also use this to find each of these derivatives. — Rodrigo de Azevedo, Sep 08 '20 at 11:06
@RodrigodeAzevedo in this case $y^{T}Ax$ it is a scalar right? — Jose Ramon, Sep 08 '20 at 14:06
Great! I will try to figure out the solution with trace that is bit unfamiliar with me as well. — Jose Ramon, Sep 08 '20 at 14:07

Jean-Claude Arbaut · Accepted Answer · 2020-09-08T07:36:28.473

5

See this question for the basics and the notation.

The derivative of the scalar function $f(X)$ with respect to $X$, where $X$ is a matrix, is the matrix $A$ with $A_{i,j}=\dfrac{df(X)}{dX_{i,j}}$.

And here,

$$f(X)=a^TXa=\sum_{i,j} X_{i,j}a_ia_j$$

So that

$$\dfrac{df(X)}{dX_{i,j}}=a_ia_j$$

And finally

$$A=\frac{df(X)}{dX}=aa^T$$

edited Sep 08 '20 at 07:36

answered Sep 08 '20 at 06:56

Jean-Claude Arbaut

23,601
7
53
88

1

I have some basic linear algebra questions, Firstly, if a dimensions is 5x1 and X dimension is 5x5, is it that the result of $a^{T}Xa$ will be a scalar? Secondly, why I can write the first expression $\Sigma_{i,j}X_{i,j}a_{i}a_{j}$? Then, I cannot understand since there is not a swap in indexes of i and j why I have the swap for the transpose. – Jose Ramon Sep 08 '20 at 07:01
1

@JoseRamon Yes, the result is a scalar. And it is critical to understand it to well to know what is happening! – mathcounterexamples.net Sep 08 '20 at 07:09
I guess this is because it is scalar ($\Sigma_{i,j}X_{i,j}a_{i}a_{j}$), right? But then I do no see the transpose. – Jose Ramon Sep 08 '20 at 07:11
Yes I got the scalar part, so in the example for the partial derivative $\frac{\partial f(\mathbf{X})}{\partial X_{ij}} = a_{i}a_{j}$. Then I compose my result with all these partial derivatives. Then the final matrix has size NxN. But why transpose? – Jose Ramon Sep 08 '20 at 07:14
@JoseRamon It's an outer product: the $(i,j)$ element of the matrix is the prodyct $a_ia_j$. It's exactly the same as $aa^T$ (check for yourself, the matrix product $aa^T$ is trivial here). – Jean-Claude Arbaut Sep 08 '20 at 07:21
@Jean-ClaudeArbaut yes I think I am getting closer :) – Jose Ramon Sep 08 '20 at 07:24
@JoseRamon Regarding your first question: $b^TXa=\sum_i b_i (Xa)i=\sum_i b_i(\sum_j X{i,j}a_j)=\sum_{i,j} X_{i,j}b_ia_j$ (where $X$ is a matrix and $a,b$ are vectors, with compatible dimensions). Note that $Xa$ is a vector, and for two vectors $u,v$, $u^Tv$ is their scalar product. – Jean-Claude Arbaut Sep 08 '20 at 07:31
In this case is not scalar right? It is the outer product of b and a. – Jose Ramon Sep 08 '20 at 07:33
The product $aa^T$ is not a scalar, but a matrix. However the product $a^Ta$ is a scalar. Just write the matrix product, and consider vectors to be column vectors (a matrix with one column). And ins $b^TXa$, you have the scalar product of $b$ and $Xa$, which are both vectors. Hence the function you differentiate is a scalar function of the matrix $X$. – Jean-Claude Arbaut Sep 08 '20 at 07:34
$(df/dX)(X) = a \cdot a^T$ is a linear form that associates to a matrix $u$ a scalar. How do you obtain the scalar knowing the matrices $a \cdot a^T$ and $u$? – mathcounterexamples.net Sep 08 '20 at 07:39
1

@mathcounterexamples.net See https://math.stackexchange.com/questions/2807864/derivative-of-the-trace-of-the-product-of-a-matrix-and-its-transpose/2809102#2809102, where I wrote the detailed derivation. $df/dX$ is indeed a linear form, but it's written in compact form as a matrix, by convention: instead of writing a vector with $np$ entries, it's more compact to write a $n\times p$ matrix. See also the Wikipedia link above and the layout convention part (there are two competing conventions). – Jean-Claude Arbaut Sep 08 '20 at 07:44
@Jean-ClaudeArbaut Thanks. I understand now! I don't know what you think, but this seems very complex to use such results. Moreover, in term of practical stuff, do you know if those conventions are used in usual linear programming packages? – mathcounterexamples.net Sep 08 '20 at 07:55

mathcounterexamples.net · Answer 2 · 2020-09-08T14:17:39.290

$$\begin{array}{l|rcl} f : & M_n(\mathbb R) & \longrightarrow & \mathbb R\\ & X & \longmapsto & a^T X a \end{array}$$

is a linear map.

Critical is to understand what the domain and codomain of $f$ are in order to understand what $f$ is as a function.

Hence its Fréchet derivative at each point is equal to itself: $f^\prime(X).u =a^T u a$.

Following a detailed and interesting discussion with Jean-Claude Arbaut (see the comments!), we can rewrite

$$f^\prime(X).u =a^T u a = \mathrm{tr}(a^T u a) = \mathrm{tr}(u \cdot (a \cdot a^T))= \mathrm{tr}((a \cdot a^T) \cdot u) = \mathrm{tr}(A \cdot u)$$

where $A = a \cdot a^T$ is defined as the matrix calculus derivative of $f$ with respect to $X$. This is in fact what is meant by

$$\nabla_{\mathbf{X}} \left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right) = \frac{\partial\left( \mathbf{a}^{T} \mathbf{X} \mathbf{a} \right)}{\partial \mathbf{X}}=\mathbf{a}\cdot\mathbf{a}^{T}$$ in the Matrix Cookbook.

Calculate the gradient of a linear scalar field

2 Answers2