0

I am referring to @copper.hat's response to : Derivative of Quadratic Form. I do not have the reputation to reply directly. My goal is to find a way to better differentiate and understand these functions for the purpose of learning the gradient. I will try to write everything here to make this question independent.

Let $Q(x) = x^TAx$. $x∈R^n$ and $A∈R^{nxn}$

Fitting to $Q(x+h)−Q(x)$, $Q(x)$ expands to $(x+h)^TA(x+h)$ which expands further to $x^TAx+x^TAh+h^TAx+h^TAh-x^TAx$ simplifying to $x^TAh+h^TAx+h^TAh$.

How is it that this changes to $x^TAh+h^TAx$ in the reply on the linked post, dropping the $h^TAh$?

I see the references to $|h^TAh|≤∥A∥∥h∥^2$ however Googling "Cauchy Scwarz" gives very broad results and I'm having trouble understanding.

I've looked up many ways to find the gradient of matrix functions (ex: $b^TX^TXb$ , $1/2x^TAx+b^Tx$) and this process seems to be the most intuitive for me.

Ryan
  • 35
  • 1
    I also want to link: https://math.stackexchange.com/questions/222894/how-to-take-the-gradient-of-the-quadratic-form for maybe any future reference, which is another post that helped me a lot with solving $∂(x^Ty)/∂x$. However it doesn't seem to be easily applicable to some of the equations I gave as an example at the bottom in the post. – Ryan Feb 24 '21 at 20:28
  • $h^T Ah=O(|h|^2),$ so this term will not contribute to the derivative. – WoolierThanThou Feb 24 '21 at 20:29
  • @WoolierThanThou What does $O(∥h∥2)$ mean? I saw it as an explanation for dropping it, but nothing about what it means. – Ryan Feb 24 '21 at 20:42
  • It means that $|h^T ah|\leq C|h|^2$ for some appropriate constant $C$. One bound would be $|A|$ where $|\cdot|$ denotes the operator norm. However, you can get away the more trivial bound $C=n\max_{i,j} |a_{i,j}|$. – WoolierThanThou Feb 24 '21 at 20:58

2 Answers2

1

The derivative by a vector $v$ is the Gateaux derivative in the direction $v,$ meaning we take an extra variable $t,$ near a point $x_0$ write out the function evaluated at $x_0 + t h$ and see what happens. Your $$ x^T A x = (x_0 + tv)^T A (x_0 + t v) = x_0 A x_0 + t \; (x_0^T Av + v^T A x_0) + t^2 \; v^T A v $$ As a 1 by 1 matrix is symmetric, we have $v^T A x_0 =x_0^T Av $ and $$ x^T A x = x_0^T A x_0 + 2t \; x_0^T Av + t^2 \; v^T A v $$ and its derivative at $t=0$ is the scalar $$2 x_0^T Av \;.$$ This is the dot product of $v$ with $2Ax_0,$ so the gradient is $2Ax_0$ when written as a column vector

Will Jagy
  • 146,052
  • +1 But you have a typo ($x^T$ instead of $v^T$) in the line that starts: As a 1 by 1 matrix $\ldots;$ – greg Mar 18 '21 at 18:10
0

By definition a derivative at a point $x$ is such a linear function $Dh$ that this holds $$f(x+h)=f(x)+Dh+g(h),\ g(h)\in o(||h||)$$ where by $g(h)\in o(||h||)$ we mean some function $g(h)$ such that $\lim_{h\to 0}\frac{|g(h)|}{||h||}=0$.

In the case of quadratic form $f(x)=x^TAx$ we have $$f(x+h)=\dots=x^TAx+2x^TAh+h^TAh$$ and this is exactly in the form of the first formula with $$f(x)=x^TAx,\ Dh=2x^TAh,\ g(h)=h^TAh\in o(||h||)$$ The last statement follows indeed from Cauchy–Schwarz inequality $|a^Tb|\leq||a||||b||$ for vectors $a,b$ and also from matrix norm inequality $||Ax||\leq||A||||x||$. $$\lim_{h\to 0}\frac{|h^TAh|}{||h||}\leq\lim_{h\to 0}\frac{||h||||Ah||}{||h||}\leq \lim_{h\to 0}\frac{||h||||A||||h||}{||h||}=\lim_{h\to 0}||h||||A||=0$$

Koncopd
  • 1,015
  • 5
  • 16