2

I have a vector $x$ and I'm trying to find $\frac{dy}{dx}$, where $y=x^{T}x$.

I believe the answer should be $2x^{T}$ but I'm trying to understand why. Does the product rule apply to matrices/vectors being multiplied?

The issue I run into is the following:

If I use the product rule blindly (which I don't know if I'm allowed to do), I get:

$\frac{dy}{dx} = \frac{dx^{T}}{dx}x+x^{T}\frac{dx}{dx}$

$\frac{dx^{T}}{dx} = \frac{dx}{dx} = I$

So I get:

$\frac{dy}{dx} = x+x^{T}$

But this makes no sense since the dimensions of the vectors don't match. If one of them were transposed then I get 2x and everything would be fine. Any help would be greatly appreciated.

Thanks!

  • A similar question was recently asked here. Also related: https://math.stackexchange.com/questions/1897416/chain-rule-applied-to-a-scalar-result-of-a-product-between-2-vectors?rq=1 – amd Jun 20 '17 at 21:14
  • Did you try to write it down coordinate-wise, i.e. calculating $\partial y/\partial x_i$? – Roland Jun 23 '17 at 06:40

1 Answers1

1

Your expression for $\frac{\partial y}{\partial x}$ is wrong. You're on the right track, but the Product Rule is valid for the differential, not the gradient.

The differential of a vector is just another vector, whereas the gradient is a matrix which obeys a more complicated, non-commutative algebra. As you study matrix calculus, you will come to appreciate the utility of the differential-based approach.

So first let's find the differential, and then the gradient $$\eqalign{ y &= x^Tx \cr dy &= dx^T\,x + x^T\,dx \cr &= 2x^T\,dx \cr \frac{\partial y}{\partial x} &= 2x^T \cr }$$ where I transposed the 1st term in the 3rd line, since $(a^Tb=b^Ta)$ for real vectors, like $(x,\,dx)$

Depending on which layout convention is used, some prefer writing the gradient as the transpose of this, i.e. $\,2x$.

greg
  • 40,033