0

I'm trying to understand the derivation of $x^\top Ax$ from this step by step explanation (from this previous question), which I'm going to copy for the sake of clarity:

The only thing you need to remember/know is that

$$\dfrac{\partial (x^Ty)}{\partial x} = y$$ and the chain rule, which goes as $$\dfrac{d(f(x,y))}{d x} = \dfrac{\partial (f(x,y))}{\partial x} + \dfrac{d( y^T(x))}{d x} \dfrac{\partial (f(x,y))}{\partial y}\quad \text{(1)}$$ Hence, $$\dfrac{d(b^Tx)}{d x} = \dfrac{d (x^Tb)}{d x} = b$$ $$\dfrac{d (x^TAx)}{d x} = \dfrac{\partial (x^Ty)}{\partial x} + \dfrac{d (y(x)^T)}{d x} \dfrac{\partial (x^Ty)}{\partial y}$$ where $y = Ax$. And then, that is, $$\dfrac{d (x^TAx)}{d x} = \dfrac{\partial (x^Ty)}{\partial x} + \dfrac{d( y(x)^T)}{d x} \dfrac{\partial (x^Ty)}{\partial y} = y + \dfrac{d (x^TA^T)}{d x} x = y + A^Tx = (A+A^T)x$$ The definition of the multivariate chain rule for multiplication says: $$ f(u,v) = uv $$ the partials are $D_1f = v$ and $D_2f = u$. Thus, $$ \frac{d}{dx}(g(x)h(x)) = h(x)\frac{d}{dx}g(x) + g(x)\frac{d}{dx}h(x) $$ Here is my question: How can we make the connection between the 2?

ecjb
  • 1,015

1 Answers1

2

All matrix differentiation questions can be answered by expanding out the indices, in this case $$ x^TAx=\sum_{ij}x_iA_{ij}x_j $$ from which it follows that $$ \frac{\partial(x^TAx)}{\partial x^k}=\sum_{ij}\frac{\partial x_i}{\partial x_k}A_{ij}x_j+\sum_{ij}x_iA_{ij}\frac{\partial x_j}{\partial x_k}, $$ where all we have done is apply the usual product rule to each term.

Now $\partial x_i/\partial x_j$ equals $1$ if and only if $i=j$, and is zero otherwise, so $$ \frac{\partial(x^TAx)}{\partial x^k}=\sum_{j}A_{kj}x_j+\sum_{i}x_iA_{ik}=(Ax)_k+(A^Tx)_k, $$ which means that $$ \frac{\partial(x^TAx)}{\partial x}=Ax+A^Tx. $$

pre-kidney
  • 30,884
  • Many thanks for the answer @pre-kidney. Is the notation $\sum_{ij}$ equivalent to $\sum_{i}\sum_{j}$ $\implies$ $\sum_{ij}x_iA_{ij}x_j = \sum_{i}\sum_{j} x_iA_{ij}x_j$? Regarding the product rule $(f+g)' = f'g + fg'$, how could be formally define $f$ and $g$ respectively? $f(x) = x$ and $g(x) = Ax$? – ecjb Sep 28 '19 at 09:52
  • Moreover, it is probably obvious, but how can we illustrate the fact that $\frac{\partial x_i}{\partial x_j}$ equals 1 if an only if $i = j$ and is zero otherwise? – ecjb Sep 28 '19 at 11:10
  • Yes $\sum_{ij}=\sum_i\sum_j$ 2. No, $f(x_k)=x_i$ and $g(x_k)=A_{ij}x_j$ (since it is term-by-term) 3. If $i\not=j$ then $x_i$ is constant with respect to $x_j$ hence the derivative is zero. If $i=j$ the derivative is $1$ by the power rule https://en.wikipedia.org/wiki/Power_rule
  • – pre-kidney Sep 28 '19 at 17:28