4

I'm trying to compute the matrix Wirtinger derivative $$\frac{\partial (f\circ g)(Z)}{\partial Z}$$ where $g(Z) := B(A Z-Z A)$ and $f(g(Z)):= \mathrm{Tr}\left(\sqrt{g(Z)^* g(Z)}\right)$. Here $Z$ is a complex Hermitian (also positive) matrix and $f:\mathbb{C}^{n\times n} \mapsto \mathbb{C}^{n\times n}$ is the nuclear or Ky Fan norm.

I'm stuck with applying the chain rule: I'm not sure how to contract the tensors arising from $\frac{\partial g(Z)}{\partial Z}$ (which is a fourth-order tensor according to MatrixCalculus.org) with the matrix $\frac{\partial f(W)}{\partial W}\big{\vert}_{W=g(Z)}$.

I know from this answer that $$\frac{\partial f(W)}{\partial W} = W(W^TW)^{-1/2} \, .$$

Could I get some help on this? I can derive $$\frac{\partial \mathrm{Tr}(AZ)}{\partial Z} = A^T$$ using the chain rule, but I can't extend it to this $f\circ g$.

Aritra Das
  • 3,642

1 Answers1

4

Using the Chain Rule in Matrix Calculus is difficult because it requires the calculation of higher-order tensors as intermediate quantities.

Differentials offer an alternative approach, one which doesn't need these awkward quantities. The differential of a matrix behaves just like a matrix. In particular, it obeys all of the rules of Matrix Algebra.

$ \def\k{\otimes} \def\o{{\tt1}} \def\BR#1{\left[#1\right]} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\Unvc#1{\op{Unvec}\LR{#1}} \def\vc#1{\op{vec}\LR{#1}} \def\tr#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\q{\quad} \def\qq{\qquad} \def\qif{\q\iff\q} \def\qiq{\q\implies\q} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\red#1{\color{red}{#1}} \def\green#1{\color{green}{#1}} \def\blue#1{\color{blue}{#1}} \def\RLR#1{\red{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Note that the linked result for the gradient of the nuclear norm was for a $\sf Real$ matrix.
Below is the calculation for a $\sf Complex$ matrix.

The double-dot $(:)$ product is extremely useful. It has the following properties $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \tr{A^TB} \\ B^*:B &= \frob{B}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ A:B &= B:A \;=\; B^T:A^T \\ \LR{XY}:B &= X:\LR{BY^T} \;=\; Y:\LR{X^TB} \\ }$$ The nuclear norm of $G$ is the trace of the square root of $F=G^HG$ $$\eqalign{ \def\G{\|G\|_*} \G &= \tr{F^{1/2}} \\ d\,\G &= \tfrac12\LR{F^T}^{-1/2}:dF \\ &= \tfrac12\LR{G^TG^*}^{-1/2}:\LR{G^HdG+dG^HG} \\ &= {\tfrac12G^*\LR{G^TG^*}^{-1/2}}:dG \;+\; \red{\tfrac12G\LR{G^HG}^{-1/2}}:dG^* \\ &\equiv\, M^*:dG \;+\; \red{M}:dG^* \\ &=\,M^*:dG \;+\; conjugate \\ }$$ The next step is to calculate the differential of $G$ wrt $Z$ $$\eqalign{ G = BA\green{Z} - B\green{Z}A \qiq dG \:=\: {BA\:\green{dZ} - B\:\green{dZ}\,A} \\ }$$ Substituting this into the previous result yields $$\eqalign{ d\,\G &= M^*:\LR{BA\,dZ - B\,dZ\,A} \;&+\; conjugate \\ &= \LR{A^TB^TM^*- B^TM^*A^T}:dZ \;&+\; conjugate \\ &= \LR{A^HB^HM-B^HMA^H}^*:dZ \;&+\; conjugate \\ }$$ from which the Wirtinger gradients can be identified as $$\eqalign{ \grad{\G}{Z} &= \LR{A^HB^HM-B^HMA^H}^* \\\\ \grad{\G}{Z^*} &= \LR{A^HB^HM - B^HMA^H} \\ &= \BR{\frac{A^HB^H{G\LR{G^HG}^{-1/2}} - B^H{G\LR{G^HG}^{-1/2}}A^H}2} }$$


Although it wasn't part of this question, for a rectangular matrix there are two different ways to write $M$ depending on whether $G$ is tall/skinny or short/fat $$\eqalign{ \tfrac12G\LR{G^HG}^{-1/2} \q{\sf or}\q \tfrac12\LR{GG^H}^{-1/2}G }$$

greg
  • 40,033