1

Let $V$ be the space of real, symmetric, non-singular matrices and denote by $\| \cdot \|_*$ the nuclear norm of a matrix. Given the function $F: V \rightarrow \mathbb{R}$ defined as

$$F(X) = \| X^{-1} \|_*,$$

I have calculated the derivative of $F$ with respect to $X$ as

$$ \frac{dF}{dX} = -X^{-2}.$$

My working is as follows.

Let $Y = X^{-1}$, then $dY = -X^{-1}dX X^{-1}$.

From greg's answer to this question Derivative of the nuclear norm, we have that the differential of $F$ in terms of $Y$ can be expressed as

$$dF = Y(YY^T)^{-\frac{1}{2}} : dY,$$ where the colon notation denotes the Frobenius inner product. Then substituing $Y$ for $X^{-1}$ and $dY$ for $-X^{-1}dX X^{-1}$ we get

$$dF = X^{-1}(X^{-2})^{-\frac{1}{2}}: -X^{-1}dX X^{-1},$$ where the $X^{-2}$ term on the LHS of the colon comes from the fact that $X$ is symmetric. We can rearrange to

$$dF = -X^{-3}(X^{-2})^{-\frac{1}{2}}: dX,$$ which reduces to

$$dF = -X^{-2} : dX.$$

Hence,

$$\frac{dF}{dX} = -X^{-2}.$$

I'd appreciate if someone could check my working, as I'm still relatively new to the world of matrix calculus. Thanks in advance.

mrjoeybux
  • 101

1 Answers1

1

$ \def\h{1/2} \def\LR#1{\left(#1\right)} \def\sgn#1{\operatorname{sign}\LR{#1}} $Your calculations are correct, except for one very subtle point $$\LR{X^{-2}}^{-\h} = \LR{X^2}^{\h} = X\cdot\sgn{X} \;\ne\; X$$ which is best explained in this blog post by Nick Higham.

There are many matrices for which $\,\sgn X=I.\;$ It may even be true for the matrix that you have in mind. But it is not true in general.

Anyway, this changes your final result to $$\frac{\partial F}{\partial X} = -X^{-2}\,\sgn{X}$$


Another subtle issue. In general $$ (YY^T)^{\h}\,Y\ne Y(YY^T)^{\h}$$ Although the two expressions are equal for symmetric matrices, it is easy to see that the RHS is not even defined for rectangular matrices. It is not so easy to see that the two sides are unequal for most square matrices.

So you should get in the habit of writing the differential as either $$\eqalign{ dF &= (YY^T)^{\h}\,Y:dY \\ &= Y(Y^TY)^{\h}:dY \\ }$$ or you will shoot yourself in the foot one day.


The assertion that $$X\ne\LR{X^2}^{\h}$$ is somewhat jolting the first time it is encountered.

Here's another jolting assertion to ponder (Higham strikes again) $$X \ne \log\LR{e^X}$$

greg
  • 40,033