I am referring to @copper.hat's response to : Derivative of Quadratic Form. I do not have the reputation to reply directly. My goal is to find a way to better differentiate and understand these functions for the purpose of learning the gradient. I will try to write everything here to make this question independent.
Let $Q(x) = x^TAx$. $x∈R^n$ and $A∈R^{nxn}$
Fitting to $Q(x+h)−Q(x)$, $Q(x)$ expands to $(x+h)^TA(x+h)$ which expands further to $x^TAx+x^TAh+h^TAx+h^TAh-x^TAx$ simplifying to $x^TAh+h^TAx+h^TAh$.
How is it that this changes to $x^TAh+h^TAx$ in the reply on the linked post, dropping the $h^TAh$?
I see the references to $|h^TAh|≤∥A∥∥h∥^2$ however Googling "Cauchy Scwarz" gives very broad results and I'm having trouble understanding.
I've looked up many ways to find the gradient of matrix functions (ex: $b^TX^TXb$ , $1/2x^TAx+b^Tx$) and this process seems to be the most intuitive for me.