4

How can I find the gradient of the term $a^TXb$ where $X$ is a $n \times m$ matrix, and $a$ and $b$ are column vectors. Since the gradient is with respect to a matrix, it should be a matrix. But I do not have a clue on how to derive this gradient.

Any help ?

Shew
  • 1,662

3 Answers3

3

Write the function in terms of the inner/Frobenius product (which I'll denote by a colon). Then finding the differential and gradient is straightforward $$\eqalign{ f &= ab^T:X \cr\cr df &= ab^T:dX \cr\cr \frac{\partial f}{\partial X} &= ab^T \cr\cr }$$ Note that the inner product is really just an infix notation for the trace $$A:B = {\rm tr}(A^TB)$$

lynn
  • 3,441
3

Let

$$f ( {\bf X} ) := {\bf a}^{\top} {\bf X} \, {\bf b} = \mbox{tr} \left( {\bf a}^{\top} {\bf X} \, {\bf b} \right) = \mbox{tr} \left( {\bf b} {\bf a}^{\top} {\bf X} \right) = \left\langle {\bf a} {\bf b}^{\top}, {\bf X} \right\rangle$$

where the cyclic property of the trace was used and $\langle \cdot \,, \cdot \rangle$ denotes the Frobenius inner product. Since scalar field $f$ is linear in $\bf X$, its gradient is simply

$$\nabla f ( {\bf X} ) = \color{blue}{{\bf a} {\bf b}^{\top}}$$


1

By brute force: $$a^tXb = \sum_{j=1}^n\sum_{i=1}^m a_j x_{ij}b_i,$$ $$\frac{\partial a^tXb}{\partial x_{ij}} = a_jb_i.$$