How do you calculate the gradient of a scalar product?

Question

I am trying to follow the calculation that yields

$$\nabla\langle x, Ax \rangle=2Ax$$

For a symmetric, real matrix $A$.

Do I use the bilinearity of the scalar product? The product rule for the gradient? I do not know where to start and what is legitimate as I am unexperienced with these symbols.

I assume $\langle \cdot,\cdot\rangle$ denotes the standard inner product in $\mathbb{R}^n$? — TheGeekGreek, Jun 20 '17 at 21:34
Sometimes a bit context is usefull, i.e. where did you encounter this problem...etc. However, then I would write out the lefthandside and just calculate partial derivatives. — TheGeekGreek, Jun 20 '17 at 21:36
Did you try to see $<;>$ as a function ? $f:x\mapsto f(x)=<x;Ax>$ — Sefi, Jun 20 '17 at 21:37
Trying to find the critical points of the Rayleigh-quotient, which is $\frac{\langle x, Ax \rangle}{|x|^2}$. I then split the function into a product of the nominator and the fraction with the denominator. To determine the gradient of the Rayleigh-quotient I need both gradients. — B.Swan, Jun 20 '17 at 21:38
Related: https://math.stackexchange.com/questions/1897416/chain-rule-applied-to-a-scalar-result-of-a-product-between-2-vectors?rq=1 and https://math.stackexchange.com/q/2330020/265466 — amd, Jun 20 '17 at 21:42

score 3 · Accepted Answer · answered Jun 20 '17 at 21:44

Try expanding the term $\langle x, Ax\rangle$: $$\langle x, Ax\rangle = x^TAx = \sum_{i,j=1}^n x_ix_jA_{ij}$$ and so $$\frac{\partial}{\partial k}\langle x,Ax\rangle = \frac{\partial}{\partial k}\sum_{i,j=1}^n x_ix_jA_{ij} = \sum_{i,j=1}^n \frac{\partial}{\partial x_k}x_ix_jA_{ij}$$ On the RHS, only terms where $i=k$ or $j=k$ survive. The above equals $$=2x_kA_{kk}+\sum_{i\neq k} x_i(A_{ik}+A_{ki}) = \sum_{i=1}^n x_i (A_{ik}+A_{ki})$$ On the LHS, that first term is the derivative of $x_k^2A_{kk}$, $x_iA_{ik}$ is the derivative of $x_ix_kA_{ik}$ where $i\neq k$, and $x_iA_{ki}$ is the derivative of $x_kx_iA_{ki}$ for $k\neq i$.

Notice that the RHS is the $k^{\mathrm{th}}$ component of $(A+A^T)x$. Therefore, we have $$\nabla \langle x,Ax\rangle = (A+A^T)x$$

score 1 · Answer 2 · answered Jun 20 '17 at 21:45

For $k = 1,\dots,n$ we have $$\begin{align*} \partial_k \langle x,Ax \rangle &= \partial_k \sum_{i = 1}^n \sum_{j = 1}^n x_i a_{ij}x_j\\ &= \sum_{i = 1}^n \sum_{j = 1}^n \partial_k (x_i a_{ij}x_j) \\ &= \sum_{i = 1}^n \sum_{j = 1}^n (\delta_{ik} a_{ij}x_j + x_ia_{ij}\delta_{jk})\\ &= \sum_{j = 1}^n a_{kj}x_j + \sum_{i = 1}^n x_ia_{ik}\\ &= \sum_{j = 1}^n a_{kj}x_j + \sum_{i = 1}^n x_ia_{ki}\\ &= 2\sum_{j = 1}^n a_{kj}x_j\\ &= 2Ax.\end{align*}$$

score 1 · Answer 3 · answered Jun 20 '17 at 21:51

I will expand on Salsifis's comment. Recall that the gradient $\nabla f(x)$ of $f:\mathbb R^2\to\mathbb R$ at $x$ is the linear tranformation $T:\mathbb R^2\to\mathbb R$ that satisfies $$ f(x+\delta)=f(x)+T(\delta)+o(\delta), $$ where $\delta\in\mathbb R^2$ is a small vector. For $f(x)=(x,Ax)$, we have $$ f(x+\delta)=(x+\delta,A(x+\delta))=(x,Ax)+(\delta,Ax)+(x,A\delta)+(\delta,A\delta). $$ The last term is $O(\delta^2)=o(\delta)$, so using $(\delta,Ax)=(Ax,\delta)$ and $(x,A\delta)=(A^Tx,\delta)$, we obtain: $$ f(x+\delta)=f(x)+(Ax+A^Tx,\delta)+o(\delta). $$ Letting $\delta\to 0$, we conclude that $\nabla f(x)$ is the transformation $\delta\mapsto ((A+A^T)x,\delta)$, which can be identified with $(A+A^T)x$. Since $A^T=A$, this yields the result.

score 1 · Answer 4 · answered Jun 20 '17 at 22:11

Fix a unit vector $v$. Let $f(x) = \langle x, Ax \rangle$.

Then $D_v(f) = \displaystyle \lim_{t \to 0} \frac{\langle x+tv, A(x+tv)\rangle - \langle x, Ax \rangle}{t} = \lim_{t \to 0} \frac{\langle x, Ax \rangle + t \langle v, Ax\rangle + t\langle x, Av \rangle + t^2 \langle v, Av \rangle - \langle x, Ax\rangle}{t}$.

Since $A$ is real and self-adjoint, we cancel and get that the limit is $\langle v, Ax \rangle + \langle x, Av \rangle = 2\langle v, Ax\rangle = \langle v, 2Ax\rangle$.

We know that $D_v(f) = \langle v, \nabla f \rangle$, but this doesn't right away say that $\nabla f = 2Ax$. Can you see how to show that? You'll want to use that fact that $\|x\|^2 = 0 \iff x =0$.

score 1 · Answer 5 · answered Jun 21 '17 at 14:07

Simply without any coordinates just with Leibniz for any inner product: $$d_p(\langle x,Ax\rangle)=\langle p,Ax\rangle+\langle x,Ap\rangle =\langle Ax,p\rangle+\langle A^\star x,p\rangle=\langle(A+A^\star)x,p\rangle,$$ where $A^\star$ denotes the adjoint (see https://en.wikipedia.org/wiki/Hermitian_adjoint) of $A$.

How do you calculate the gradient of a scalar product?

5 Answers5