2

Calculate the following gradient

$$\nabla_W \left( x^TW^{-T}W^{-1}x \right)$$

where $W$ is a $\mathbb{R}^{d×d}$ matrix and $x$ is a $\mathbb{R}^d$ vector. The result should be a $\mathbb{R}^{d×d}$ matrix.

I wonder whether there is a clean and compact form of the result. I first tried to write this as

$$2(W^{-1}x)\frac{d(W^{-1}x)}{dW}$$

but the latter one is a tensor (actually it should be $2 \sum_{i=1}^d(W^{-1}x)_i\frac{d(W^{-1}x)_i}{dW}$ but the next step is messy.

1 Answers1

3

Define $y=W^{-1}x$ and calculate its differential. $$\eqalign{ x &= Wy \cr dx &= W\,dy + dW\,y = 0 \quad (x{\rm \,is\,constant})\cr dy &= -W^{-1}\,dW\,y \cr }$$ Write the function in terms of $y$ and find its differential and gradient. $$\eqalign{ \phi &= y:y \cr d\phi &= 2y:dy \cr &= -2y:(W^{-1}\,dW\,y) \cr &= -2(W^{-T}yy^T):dW \cr &= -2(W^{-T}W^{-1}xx^TW^{-T}):dW \cr \frac{\partial \phi}{\partial W} &= -2W^{-T}W^{-1}xx^TW^{-T} \cr }$$ where a colon denotes the trace product, i.e. $\,\,A:B = {\rm Tr}(A^TB)$

NB: The use of differentials avoids the need to calculate tensors as intermediate quantities.

greg
  • 40,033