1

Hello I am trying to compute the derivative of the equation

$f = x^TAx $

I can rewrite it to

$f = \sum_{i,j} A_{ij}x_ix_j $

Then find its derivative

$\frac{df}{dx_k} = \sum_j A_{kj}x_j + \sum_i A_{ik}x_i $

How do I rewrite this in matrix form ?

Kong
  • 932

2 Answers2

4

A simple way to get this result is using matrix calculus. You can prove that if $M$ is a matrix and $v$ is a column vector, then as you'd expect, $\frac{\partial }{\partial v}(Mv) = M$. Second, you can prove as a special case that if $a$ and $b$ are column vectors, then $a^Tb$ and $ab^T$ are equal as scalars, and: $$\frac{\partial}{\partial b} (b^T a) = \frac{\partial}{\partial b} (a^T b) = a^T$$

From there, all you need is the observation that $(Ax)$ is a column vector. Hence put $w(x) \equiv Ax$. Using the product rule, we find:

$$\frac{\partial }{\partial x} x^T A x = \frac{\partial }{\partial x} x^T w = x^T \frac{\partial w}{\partial x} + w^T = x^TA + (Ax)^T = x^T A + x^T A^T = x^T(A+A^T)$$


You can also get the same result just with arithmetic, though it requires more elaborate calculation. Recall that for any compatible matrix $M$ and vector $v$, you can expand the product $Mv$ as follows:

$$Mv = \left[\sum_{j} M_{ij}\cdot v_j\right]_i$$

We'll use this rule to convert your answer from a summation to a matrix multiplication.


In your case, your derivative is a matrix like this:

$$\begin{bmatrix}\partial_1 f& \partial_2 f & \cdots & \partial_k f & \cdots &\partial_n f\end{bmatrix} = \left[\sum_{j} A_{kj}\cdot x_j\right]_k + \left[\sum_{i} A_{ik}\cdot x_i\right]_k$$

There are two terms in this equation. The first term matches our rule for matrix multiplication: we can rewrite $$\left[\sum_{j} A_{kj}\cdot x_j\right]_k \Rightarrow A\cdot x$$

The second term doesn't match exactly, because the summation is over the first index $i$ instead of the second index $k$. To fix this, we'll take a transpose.

$$ \left[\sum_{i} A_{ik}\cdot x_i\right]_k = \left[\sum_{i} A^T_{ki}\cdot x_i\right]_k \Rightarrow A^T \cdot x$$

In short,

$$\frac{\partial f}{\partial x} = \frac{\partial(x^T A x)}{\partial x} = (A + A^T)\cdot x$$

Although because the derivative is expected to be a row vector, it may be more appropriate to write the transpose of this:

$$\frac{\partial f}{\partial x} = \frac{\partial(x^T A x)}{\partial x} = x^T\cdot (A^T + A)$$

user326210
  • 19,274
3

In index notation, you can write $$\frac{\partial f}{\partial x_k}=A_{kj}x_j + A_{ik}x_i$$ where a summation is implied by the presence of a repeated index.

You can also change a summation index (aka a dummy index), without altering the result, e.g. $(x_iy_i = x_ky_k)$.

So let's change both dummy indices to $p$ yielding $$\eqalign{ \frac{\partial f}{\partial x_k} &= A_{kp}x_p + A_{pk}x_p \cr &= (A_{kp}+A_{pk})\,x_p \cr }$$ which in matrix notation would be written as $\,(A+A^T)\,x$

lynn
  • 3,441