0

Suppose that matrices $A$ and $B$ are Hermitian and positive semidefinite. How can I obtain the gradient of the trace distance between $A$ and $B$, i.e., $$C := \Vert A - B \Vert_1 := \mbox{Tr} \left( \sqrt{(A-B)^\dagger (A-B)}\right)$$ with respect to $A$, i.e., $\nabla_A C$?

  • IMHO, you should not have chosen $C$. A norm is a nonnegative real. Thus, you should not pick the same notation for scalars and matrices. I would have picked $c$ instead. – Rodrigo de Azevedo Jan 12 '23 at 10:25
  • @RodrigodeAzevedo Schatten 1-norm (https://en.wikipedia.org/wiki/Schatten_norm) is the same as the nuclear norm. It's also called as trace norm. All three norms calculate a sum of singular values of a matrix. Good point, sorry for the confusion on the notation $C$. – userflux9674 Jan 12 '23 at 16:49

2 Answers2

1

Define the matrix variables $$\eqalign{ \def\LR#1{\left(#1\right)} \def\h{{\frac 12}} \def\n{{-\frac 12}} X &= (A-B) \quad\implies\quad dX = dA \\ W &= X^{\dagger}X \\ }$$ and use it to calculate the requested gradient $$\eqalign{ \def\p{\partial} C &= {\rm Tr}\!\LR{W^\h} \\ dC &= \h\LR{W^T}^\n:dW \\ &= \h\LR{W^\n}^*:{X^{\dagger}dX} \\ &= \h\LR{XW^\n}^*:dA \\ \frac{\p C}{\p A} &= \h\LR{X\LR{X^{\dagger}X}^\n}^* \\ }$$ Since $X$ is Hermitian, this result can be further simplified and conjugated $$\eqalign{ S &= {\rm sign}(X) \;=\; X\LR{X^2}^\n \\ \frac{\p C}{\p A} &= \h\LR{X\LR{X^2}^\n}^* = \frac{S^*}2 \\ \frac{\p C}{\p A^*} &= \frac{S}2 \\ }$$ Here is more information about the matrix sign function.

greg
  • 40,033
1

Following the post recommended by Rodrigo, the distance $\phi$ is the Schatten-1 norm (or nuclear norm) of $\mathbf{X}$ is the sum of its singular values $$ \phi(\mathbf{X}) = \sum_k s_k(\mathbf{X}) = \mathrm{tr}(\mathbf{S}) $$ where we consider the singular value decomposition, $\mathbf{X} =\mathbf{A-B} =\mathbf{U}\mathbf{S}\mathbf{V}^T$,

It holds the following result for the differential $$ d\mathbf{S} = \mathrm{diag} \left[ \mathbf{U}^T (d\mathbf{X}) \mathbf{V} \right] $$ thus \begin{eqnarray*} d\phi &=& \mathrm{tr}[d\mathbf{S}] = \mathrm{tr} \left[ \mathbf{U}^T (d\mathbf{X}) \mathbf{V} \right] = \mathbf{U}\mathbf{V}^T: d\mathbf{X} \end{eqnarray*} Thus $$ \frac{\partial \phi}{\partial \mathbf{A}} =\frac{\partial \phi}{\partial \mathbf{X}} =\mathbf{U}\mathbf{V}^T$$ is the requested gradient (in the real case). I think this can be generalized to the complex case.

Steph
  • 4,140
  • 1
  • 5
  • 13
  • +1 Generalizing to the complex case (in the Wirtinger sense) yields $$ \def\g#1#2{\frac{\partial #1}{\partial #2}} \def\u{\left(UV^H\right)} \g CA = \frac{\u^}2 \qquad\quad \g C{A^} = \frac\u2 $$ – greg Jan 15 '23 at 21:25