2

I would like to see a detailed, step-by-step derivation of the following identity

$$\frac{\partial}{\partial A} \left( y^T A x \right) = y x^T$$

where $x, y \in \mathbb R^n$ and $A \in \mathbb R^{n \times n}$. I thought it would be easy to do using Einstein notation, but I am messing up with the reciprocal basis.

2 Answers2

2

Let

$$f (\mathrm X) := \mathrm b^\top \mathrm X \,\mathrm a = \mbox{tr} \left( \mathrm b^\top \mathrm X \,\mathrm a \right) = \mbox{tr} \left( \mathrm a \mathrm b^\top \mathrm X \right) = \langle \mathrm b \mathrm a^\top, \mathrm X \rangle$$

where $\langle \cdot \,, \cdot \rangle$ denotes the Frobenius inner product. Since $f$ is linear in $\rm X$, its gradient is simply

$$\nabla f (\mathrm X) = \mathrm b \mathrm a^\top$$

1

Since $y^\intercal A x = \sum_{i,j} a_{ij}y_i x_j$, where $A = (a_{ij})$, we can easily compute element-wise \begin{align*} \frac{\partial y^\intercal A x}{\partial a_{kl}} = y_k x_l \end{align*} So clearly \begin{align*} \frac{\partial y^\intercal A x}{\partial A} = \left(\frac{\partial y^\intercal A x}{\partial a_{kl}}\right)_{kl} = (y_k x_l)_{kl} = y x^\intercal \end{align*}

Tom Chen
  • 4,748
  • thank you!. I don't quite get your notation $(•)_{kl}$ what does it mean in terms of summations? – Mauricio Cele Lopez Belon Feb 12 '19 at 02:25
  • @MauricioCeleLopezBelon So, I use $(a_{ij}){ij}$ to indicate a matrix with entries $a{ij}$. I sometimes altogether omit the outside indices for just $(a_{ij})$. So, in the last line of equations, take $\partial/\partial A$ is performing $\partial/\partial a_{kl}$ and then stitching together all these partials into their respective entries in the resulting matrix. – Tom Chen Feb 12 '19 at 02:30
  • 1
    Please explain $y x^\intercal = x y^\intercal$. – Rodrigo de Azevedo Feb 12 '19 at 10:06
  • Oops, yes, the outer product is not commutative. Fixed. What OP wrote is a typo. – Tom Chen Feb 12 '19 at 14:01