distance formula between point estimated by gradient descent for a quadratic form at each iteration and the optimal value

Question

Given a positive definite symmetric matrix $A$ and a function

$$f(x) = \frac{1}{2}x^T A x + x^T b + c$$

we know that the Hessian matrix of $f$ is $A$, which is a constant and positive definite. Hence, the stationary point of $A$ is a global minimum and its value is given by $f(-(A+A^T)^{-1}b)$. Suppose the eigen values of $A$ are $\lambda_1>...>\lambda_n$. Define the square distance from the optimum

$$d(x_k) = \| x^{\ast} - x_k \|_2^2$$

where $x^{\ast}$ is the value of $x$ that minimizes $f(x)$ value return by gradient descent and $x_k$ are the values computed in each gradient decent steps. i.e

$$x_k = \nabla f(x_{k-1}) \alpha + x_{k-1}$$

for some learning rate $\alpha$. How do we represent $d(x_k)$ in terms of the eigenvalues, eigenvectors, $x_0$(initial guess) and $k$?

I got this recursive formula:

\begin{align*} x_{n+1} \\ &=x_n-\triangledown f(x_n)a \\ & = x_n -\alpha(Ax_n+b) \\ & = (I-\alpha A)x_n -\alpha b \\ & =(I-\alpha A)((I-\alpha A)x_{n-1} - \alpha b )- \alpha b \\ & = (I-\alpha A)^2x_{n-1}-(I-\alpha A)\alpha b -\alpha b \\ &= (I-\alpha A)^{n-1}x_0 -(I-\alpha A)^{n-2}-...-\alpha \end{align*}

Furthermore, is there a sufficient condition that guarantees that the sequence $d(x_k,x*)$ converges to $0$ as $k$ goes to infinity?

Here is what I tried: the optimal value using matrix multiplication is : \begin{align*}f(x_{min})\\ &=-\frac{1}{2}b^T(A^{-1})^TA(-1)(A)^{-1}b-(A^{-1}b)^Tb+a \\ &= \frac{1}{2}b^T(A^{-1})^Tb-b^T(A^{-1})^Tb+a \\ & = -\frac{1}{2}b^TA^{-1}b+a\end{align*}. So now we have to compute the distance $( \frac{1}{2}b^TA^{-1}b+c-x_k)^T( \frac{1}{2}b^TA^{-1}b+c-x_k)$.

which is $$( \frac{1}{2}b^TPD^{-1}P^Tb+c-( (I-\alpha PDP^T)^{n-1}x_0-((I-\alpha PDP^T)^{n-2}+...+I)\alpha b))^T( \frac{1}{2}b^TPD^{-1}P^Tb+c-( (I-\alpha PDP^T)^{n-1}x_0-((I-\alpha PDP^T)^{n-2}+...+I)\alpha b))$$.

Can we simplify it?

Shouldn't the hessian be $0.5(A+A^T)$? In the scalar case the second derivative of $0.5ax^2$ is $a$ but your answer would give double that. Separately, your minimum should reduce to $-b^2/(4a)+c$ in the scalar case (you can check this easily) but your answer does not so something is wrong — Casey, Oct 16 '19 at 13:48
I solve optimal x to be $-\frac{1}{2}A^{-1}b$ is this correct at least? — user 6663629, Oct 16 '19 at 20:07
No. The Hessian is $A$, not $2 A$. If you switch the sign of the linear term, the gradient vanishes at the solution set of $A x = b$, which looks nicer. — Rodrigo de Azevedo, Oct 16 '19 at 20:08

distance formula between point estimated by gradient descent for a quadratic form at each iteration and the optimal value

0 Answers0