0

I am really stuck at calculating

$$\frac{d}{dC} \operatorname{trace} \left( X C P (X C)^T \right)$$

where $P \in R^{r\times r}$, $X \in R^{m\times n}$ and $C \in R^{n\times r}$ . Do I need to recall $A=XC$ and then apply chain rule such that

$$\frac{d \operatorname{trace}(XCP(XC)^T)}{dC}=\frac{d\operatorname{trace}(APA^T)}{dA}\cdot \frac{dA}{dC}$$

where first factor equals $A (P+P^T) $ (following this reference) and the second is $X$. Am I right ?

More correctly, by a quick dimensionality check, it would make more sense to have $X^T (XC) (P+P^T)$ at the end. Is it right ? Could someone help me prove it ?

Moreover, is $\operatorname{trace}(XCP(XC)^T)$ convex w.r.t. $C$ ?

Eric Wofsey
  • 342,377

1 Answers1

1

Your answer is correct. I think a slightly easier way to get there (from the formulas you gave) is to use invariance of trace under cyclic permutations to say $\mathrm{trace}(XCP(XC)^T)=\mathrm{trace} (X^T X C P C^T)$. You can then apply the formula $$ \frac{d \ \mathrm{trace}(A C B C^T)}{dC}=A^T C B^T + A C B $$ from your link, with $A=X^T X$ and $B=P$.

The convexity of the function depends on your choices of $X$ and $P$. If you pick $P=I$ and $X=I$ then your function is $\mathrm{trace}(C C^T)$ which is convex. On the otherhand if you pick $P=-I$ and $X=I$ then your function is $\mathrm{trace} (-C C^T)$ which is not convex.

To see that $\mathrm{trace}(C C^T)$ is convex in $C$, first note that $$ \mathrm{trace}(C C^T) \geq 0 $$ for any $C$ since $C C^T$ is positive semidefinite and the trace of a matrix is equal to the sum of its eigenvalues. Now, letting $t \in [0,1]$ and $A,B \in \mathbb{R}^{n \times r}$ it is easy to check that $$ \mathrm{trace}(t A (t A)^T)+\mathrm{trace}((1-t) A ((1-t) A)^T)=t^2 \mathrm{trace}(A A^T)+(1-t)^2 \mathrm{trace} (B B^T) $$ and $$ t^2 \mathrm{trace}(A A^T)+(1-t)^2 \mathrm{trace} (B B^T) \leq t\ \mathrm{trace}(A A^T)+(1-t) \ \mathrm{trace} (B B^T) . $$

A very rough guess for convexity is that $X^T X (P+P^T)$ should be positive semidefinite.

Eric
  • 853
  • 4
  • 12
  • 1
    If you mean explain why $\mathrm{trace} (CC^T)$ is convex, I can do that, but I'd have to think more to give a complete answer, and it really is a guess that $X^T X(P+P^T)$ should be psd. There are two reasons for the guess. First, positive semidefiniteness is "good" for convexity in some sense. E.g. spectrahedra are convex sets defined by matrix inequalities. Second is that naively it looks related to the Hessian of the function. Its not the Hessian though, the second derivative would be a fourth order tensor in this case. – Eric Feb 18 '19 at 15:14
  • 1
    @MysteryGuy Updated. – Eric Feb 18 '19 at 15:34