5

The MINRES algorithm for solving $Ax = b$ for symmetric $A$ can be described as follows: The $k$-th iterate of the algorithm is $$x_k = \arg\min_{K_k(A)} \lVert Ax-b \rVert_2$$ where $K_k(A)=\text{span}\{A^ib\mid i < k\}$ is the $k$-th Krylov subspace of $A$

By this definition, it is clear that it converges to the exact solution in $n$ iterations, if $A$ is an $n\times n$ matrix. By using this solver on matrices with a very specific structure, I noticed that in that case, the solver converges in just 3 iterations. The matrices are of the form $$ A = \begin{bmatrix} I_{n-1} & v\\ v^T & 0 \end{bmatrix} $$ where $v$ is a column vector and and $I_{n-1}$ is the $(n-1) \times (n-1)$ identity matrix. How can this early convergence be explained?

My thoughts

The matrix $A$ can in this case be seen as an Identity matrix plus 2 rank-one corrections: $$A = I_n + \begin{bmatrix} 0 & v\\ 0 & -\frac{1}{2} \end{bmatrix} + \begin{bmatrix} 0 & 0\\ v^T & -\frac{1}{2} \end{bmatrix}$$

This means that we can exactly invert the matrix using the Sherman-Morrison formula twice. I currently avoid doing this explicitly as it leads to instabilities, while 3 MINRES iterations do not. Maybe MINRES algorithm implicitly exploits the fact that we are just a rank-two matrix away from the identity?

You can verify this behaviour with this example python snipet.

Jens Renders
  • 4,464
  • That's a great description of MINRES. Do you know an equally simple description for GMRES? What about the conjugate gradient method? – littleO Jul 21 '20 at 21:29
  • 1
    @littleO Yes, GMRES can be defined in exactly the same way, but each iteration is more difficult to compute because $A$ is no longer symmetric (if $A$ is symmetric, then you should use MINRES). The conjugate gradient algorithm can also be expressed in this from: the $k$-th iteration minimizes $\lVert x-A^{-1}b \rVert_A$ over the $k$-th Krylov subspace. This is a bit less natural though, but good for comparison. Not that this minimizer is even easier to compute then the one from MINRES, but we need that $A$ is symmetric positive definite such that the $A$-norm exists. – Jens Renders Jul 22 '20 at 10:53
  • Thanks. Do you know a good reference that takes these descriptions as a starting point and then derives / discovers the iterations for each of these methods? The books I've seen tend to state the iterations without motivation, and then prove that certain properties are satisfied. I want to know a thought process by which you could discover the iterations, given the simple descriptions above as a starting point. – littleO Jul 26 '20 at 04:40

1 Answers1

1

The reason is that there is a large Eigenspace corresponding to the eigenvalue $1$. If you solve the eigenvalue problem

$$ \begin{bmatrix} I_{n-1} & v \\ v^T & 0\end{bmatrix} \cdot \begin{bmatrix} x \\ \alpha \end{bmatrix} = \lambda \begin{bmatrix} x \\ \alpha \end{bmatrix} $$

you'll find that there are $n-2$ eigenvectors with $\alpha=0$ and $x\perp v$ corresponding to the eigenvalue $1$, and 2 eigenvectors with $\alpha\neq 0$ and $x\| v$. More specifically, in the latter case the eigenvectors are $e_\pm=(x_\pm, \alpha_\pm)$ with $x_\pm=v$ and $\alpha_\pm = \frac{1}{2}(-1\pm\sqrt{1+4 v^T v})$ corresponding to the eigenvalue $\lambda_\pm = \frac{-1\pm\sqrt{1+4v^T v}}{2v^T v}$. Note that since $A$ is symmetric, those are orthogonal.

Consequently, given a target vector $y=\begin{bmatrix}b\mid\beta\end{bmatrix}$ we can split it up as a linear combination

$$ y = \underbrace{\mu_1\begin{bmatrix}v\mid\alpha_+\end{bmatrix}}_{=: z_1} + \underbrace{\mu_2\begin{bmatrix}v\mid \alpha_-\end{bmatrix}}_{=: z_2} + \underbrace{\mu_3\begin{bmatrix}v^\perp\mid 0\end{bmatrix}}_{=: r} $$

Consequently, $ A y = \lambda_1 z_1 + \lambda_2 z_2 + r $, $A^2y =\lambda_1^2 z_1 + \lambda_2^2 z_2 + r$, etc. Now it should become clear why we only need 3 iterations: We only need to figure out 3 coefficients: the coefficient to $z_1$, the coefficient to $z_2$ and the coefficient to $r$.

Lemma: assume $A$ is orthogonally similar to the $n\times n$ block matrix $A' = \begin{bmatrix}I_r & 0 \\0 & D\end{bmatrix}$, where $D$ is a diagonal matrix with $D_{ii}\neq D_{jj}\neq 1$ for all $i\neq j$. Then minres terminates in exactly $n-r+1$ iterations.

Proof: Given orthogonal $U$, which diagonalized $A$, we have $$ \|Ax - y\|_2 =\| A U^T Ux - y\|_2 = \|UA U^T U x - U y\|_2 = \|A' x' - y'\|_2 $$ So both optimization problems are equivalent,in the sense that if $x_k = \arg\min_{\xi\in K_k(A)}\|A\xi-y\|$ then $x_k =U^Tx_k'$ where $x_k' = \arg\min_{\xi\in K_k(A')}\|A'\xi-y'\|$. And the early termination in the diagonal case follows by induction over the size of $D$.

Corollary: If $A$ is orthogonally diagonalizable and has exactly $r$ disjoint eigenvalues then minres terminates in exactly $r$ iterations. (in practice this is not guaranteed due to numerical error)

import numpy as np
from scipy.sparse.linalg import minres
from scipy.stats import ortho_group

create a random matrix of the specific form

N = 100 v = np.random.randn(N-1,1) b = np.random.random(N) A = np.block([[np.eye(N-1), v], [v.T, 0]])

run MINRES for 3 iterations

callback = lambda x: print(np.linalg.norm(A.dot(x)-b)) x, info = minres(A, b, tol=1e-15, callback=callback) print("MINRES returned:", info) print("The returnvalue 0 means that it has converged.")

orthogonal similarity transform

U = ortho_group.rvs(N) A_dash = U.T @ A @ U b_dash = U @ b callback = lambda x: print(np.linalg.norm(A_dash.dot(x)-b_dash)) x, info = minres(A_dash, b_dash, tol=1e-15, callback=callback) print("MINRES returned:", info) print("The returnvalue 0 means that it has converged.")

4 disjoint EVs

U = ortho_group.rvs(N) A = np.diag(np.concatenate([2np.ones(N//4), 3np.ones(N//4), -1np.ones(N//4), 10np.ones(N//4)])) A_dash = U.T @ A @ U b_dash = U @ b callback = lambda x: print(np.linalg.norm(A_dash.dot(x)-b_dash)) x, info = minres(A_dash, b_dash, tol=1e-15, callback=callback) print("MINRES returned:", info) print("The returnvalue 0 means that it has converged.")

Hyperplane
  • 12,204
  • 1
  • 22
  • 52
  • Your formula for the two eigenvalues (not equal to 1 if v is nonzero) is wrong, it should be ( \lambda_{\pm} = \frac{1 \pm \sqrt{1+4 v^*v}}{2} ), also, for their corresponding eigenvectors, the x need not be fixed at v, it only needs to be parallel to v, (\alpha) can be scaled accordingly. – water stone Mar 21 '25 at 20:30