2

Let $$f(W) = (\mathbf{t} - W^T\mathbf{x})^TA(\mathbf{t} - W^T\mathbf{x})$$ I expanded to get: $$f(W) = \mathbf{t}^TA\mathbf{t} - \mathbf{t}^TAW^T\mathbf{x} - \mathbf{x}^TWA\mathbf{t} + \mathbf{x}^TWAW^T\mathbf{x}$$ When taking derivatives the first term vanishes, so we need only consider the remaining terms. Let us start with the second term: $$\mathbf{t}^TAW^T\mathbf{x} = Tr[W^T\mathbf{x}\mathbf{t}^TA]$$ Taking derivatives we obtain

$$\frac{\partial \mathbf{t}^TAW^T\mathbf{x} }{\partial W} = \mathbf{x}\mathbf{t}^TA$$

Similarly, the derivative of the third term is: $\mathbf{x} \mathbf{t}^T A^T$

I am looking through my text books appendix on how to deal with derivatives with respect to matrices and I cannot see anything to help me with the last term (maybe A.27???):

https://www.bishopbook.com/

Page 620-621

Can I have some guidance with the last term please? Maybe there is a way to "see" the derivative without expanding. I would like to stick with the rules from my textbook

HMPtwo
  • 651
  • 2
  • 11

2 Answers2

1

Since $A$ may be assumed symmetric (otherwise symmeterize it as $(A+A^T)/2$), one way to approach the last term is to take the spectral decomposition of $A$ as in (A.45) $$ A=\sum_k\lambda_k u_ku_k^T. $$ Then the last term becomes the sum of squares $$ x^TWAW^Tx=\sum_k\lambda_k(u_k^TW^Tx)^2 $$ and $$ \frac{\partial x^TWAW^Tx}{\partial W}=2\sum_k\lambda_ku_k^TW^Tx\cdot\frac{\partial (u_k^TW^Tx)}{\partial W}= 2\sum_k\lambda_ku_k^TW^Tx\cdot\frac{\partial\operatorname{Tr}(W^Txu_k^T)}{\partial W} $$ that one may differentiate using (A.25). After the differentiation the matrix $A$ can be found back if one places the scalar $u_k^TW^Tx=x^TWu_k$ between the two vectors $$ 2\sum_k\lambda_ku_k^TW^Tx\cdot xu_k^T=2\sum_k\lambda_kx(x^TWu_k)u_k^T=2xx^TWA. $$

A.Γ.
  • 30,381
0

$$\newcommand{\tmop}[1]{{\operatorname{#1}}}$$

We have \begin{eqnarray*} f (W) & = & (t - W^T x)^T B (t - W^T x), \end{eqnarray*} where $B = \frac{A + A^T}{2}$. The principled way to compute $\nabla f (W)$ is to use differentials, which are a concise notation for manipulating Frechet derivatives. We have \begin{eqnarray*} df (W) & = & (- dW^T x)^T B (t - W^T x) + (t - W^T x)^T B (- dW^T x)\\ & = & - 2 (t - W^T x)^T BdW^T x\\ & = & - 2 \tmop{Tr} ((t - W^T x)^T BdW^T x)\\ & = & - 2 \tmop{Tr} (dW^T x (t - W^T x)^T B) . \end{eqnarray*} Hence \begin{eqnarray*} \nabla f (W) & = & - x (t - W^T x)^T B. \end{eqnarray*} As another example, take the third term $x^T WBW^T x$. We have \begin{eqnarray*} d (x^T WBW^T x) & = & x^T dWBW^T x + x^T WBdW^T x\\ & = & 2 x^T WBdW^T x\\ & = & 2 \tmop{Tr} (x^T WBdW^T x)\\ & = & 2 \tmop{Tr} (dW^T xx^T WB), \end{eqnarray*} so \begin{eqnarray*} \nabla_W (x^T WBW^T x) & = & 2 xx^T WB. \end{eqnarray*}

Mason
  • 12,787