0

Let's assume I have a matrix $B \in R^{m \times r}$, a matrix $P \in R^{r \times r}$ and a diagonal matrix $A \in R^{r \times r}$ defined like this :

$$A_{ii}=a_i \forall i=1,...,r$$

I am stuck in computing the derivative of $trace(BAP(BA)^T)$ w.r.t. to each $a_i$. I would say that it is $$sum(B(P+P^T)*B,1)$$ where $*$ means "element-wise product" and $sum(.,1)$ is the sum of each column. However, I am not sure this is correct. Actually, I let $X=BA$, used this rule to derive $trace(XPX^T)$ (it gives $X*(P+P^T)$ if I am right and then applied chain rule. Could someone help me to retrieve the justification of it ?

3 Answers3

1

From your previous question we know the gradient of the function with respect to the matrix $A$. $$G = \frac{\partial\psi}{\partial A} = B^TBA(P+P^T)$$ So we can expand the differential and perform a change of variables to obtain the gradient wrt the vector $a$. $$\eqalign{ d\psi &= G:dA = G:{\rm Diag}(da) = {\rm diag}(G):da \cr g &= \frac{\partial\psi}{\partial a} = {\rm diag}(G) \cr }$$

lynn
  • 3,441
0

1) As indicated by the OP, you can easily verify that

$$\frac{\partial \textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T)}{\partial \textbf{X}} = \textbf{X}\textbf{P}^T+\textbf{X}\textbf{P}$$

as

$$ \textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T) = \sum_{pqr}X_{pq}P_{qr}X_{rp}$$ $$\bbox[5px,border:2px solid #000000]{\frac{\partial \textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T)}{\partial X_{ij}} = \sum_{q}X_{jq}P_{qi} + \sum_{r}P_{jr}X_{ri} = (\textbf{X}\textbf{P}^T)_{ij}+(\textbf{X}\textbf{P})_{ij} } \qquad\Box$$

In most common cases, a matrix $\textbf A$ has no special structure. That is to say, all elements of a matrix $\textbf A$ are independent of each other (e.g. not symmetric, Toeplitz, positive definite, diagonal). This means that

$$ \frac{\partial A_{pq}}{\partial A_{rs}} = \delta_{pr}\delta_{qs}\quad\textrm{or}\qquad \frac{\partial\textbf{A}}{\partial A_{pq}} = \textbf{J}^{pq},$$

where $\textbf{J}^{pq}$ is a matrix with all zeros except element $(p,q)$ is 1 and thus we can write:

$$\textbf A=\sum_{pq=1}^n A_{pq}\textbf J^{pq}$$

Generally, if a structure is imposed we can decompose it by means of a set of linear independent structure matrices $\textbf S_i$ such that

$$\textbf A=\sum_{i=1}^n a_i\textbf S_i\qquad\textrm{and}\qquad\frac{\partial\textbf{A}}{\partial a_p} = \textbf{S}_p$$

As indicated by the OP, we know that $\textbf{X} = \textbf{B}\textbf{A}=\sum_i a_i \textbf{B}\textbf{S}_i$

Then the derivative of $\textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T)$ to $a_i$ is now straightforward using the chain rule:

$$\frac{\partial \textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T)}{\partial a_i} = \sum_{pq}\frac{\partial \textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T)}{\partial X_{pq}}\cdot\frac{\partial X_{pq}}{\partial a_i} = \textrm{Tr}\,\left(\frac{\partial \textrm{Tr}\,(\textbf{X} \textbf{P} \textbf{X}^T)}{\partial \textbf{X}}\cdot\frac{\partial \textbf{X}^T}{\partial a_i} \right)$$

with

$$ \frac{\partial X_{pq}}{\partial a_i} = \sum_{r} B_{pr} S_{i;rq}\qquad\textrm{or}\qquad \frac{\partial \textbf{X}}{\partial a_i} = \textbf{B}\textbf{S}_i$$

So finally we can write down:

$$\bbox[5px,border:2px solid #00A000]{\frac{\partial \textrm{Tr}\,(\textbf{BAP}\textbf{A}^T\textbf{B}^T)}{\partial a_i} = \textrm{Tr}\,\left\{ \left(\textbf{BA}\textbf{P}^T+\textbf{BA}\textbf{P}\right) \cdot \left(\textbf{B}\textbf{S}_i\right)^T \right\} = \textrm{Tr}\,\left\{\textbf{B}^T\textbf{BA}\left(\textbf{P}^T+\textbf{P}\right)\textbf{S}_i^T\right\}}$$

So far we did not impose any structure on $\textbf A$, but since it is diagonal we can say that $S_{i;pq}=\delta_{pq}\delta_{pi}$.

kvantour
  • 689
0

Great question and I found the answers helpful but did not immediately help me understand the problem fully. I will start with a key identity that you need. $$ tr(AB) = \text{diag}(A)'\text{diag}(B), $$ IF, either $A$ or $B$ is diagonal. Now we calculate the differential $$ d\phi = tr(B(dA)PA'P + BAP(dA)'P) = tr(PA'PB(dA) + P'(dA)P'A'B'), $$ $$ d\phi = tr((PA'PB + P'A'B'P')dA).$$ Since $dA$ is diagonal we can use the diagonal identity: $$d\phi = \text{diag}(PA'PB + P'A'B'P')'d(\text{diag}(A) $$ So the derivative is $\text{diag}(PA'PB + P'A'B'P').$