The reason is that there is a large Eigenspace corresponding to the eigenvalue $1$. If you solve the eigenvalue problem
$$
\begin{bmatrix} I_{n-1} & v \\ v^T & 0\end{bmatrix}
\cdot
\begin{bmatrix} x \\ \alpha \end{bmatrix}
= \lambda \begin{bmatrix} x \\ \alpha \end{bmatrix}
$$
you'll find that there are $n-2$ eigenvectors with $\alpha=0$ and $x\perp v$ corresponding to the eigenvalue $1$, and 2 eigenvectors with $\alpha\neq 0$ and $x\| v$. More specifically, in the latter case the eigenvectors are $e_\pm=(x_\pm, \alpha_\pm)$ with $x_\pm=v$ and $\alpha_\pm = \frac{1}{2}(-1\pm\sqrt{1+4 v^T v})$ corresponding to the eigenvalue $\lambda_\pm = \frac{-1\pm\sqrt{1+4v^T v}}{2v^T v}$. Note that since $A$ is symmetric, those are orthogonal.
Consequently, given a target vector $y=\begin{bmatrix}b\mid\beta\end{bmatrix}$ we can split it up as a linear combination
$$
y =
\underbrace{\mu_1\begin{bmatrix}v\mid\alpha_+\end{bmatrix}}_{=: z_1}
+ \underbrace{\mu_2\begin{bmatrix}v\mid \alpha_-\end{bmatrix}}_{=: z_2}
+ \underbrace{\mu_3\begin{bmatrix}v^\perp\mid 0\end{bmatrix}}_{=: r}
$$
Consequently, $ A y = \lambda_1 z_1 + \lambda_2 z_2 + r $, $A^2y =\lambda_1^2 z_1 + \lambda_2^2 z_2 + r$, etc. Now it should become clear why we only need 3 iterations: We only need to figure out 3 coefficients: the coefficient to $z_1$, the coefficient to $z_2$ and the coefficient to $r$.
Lemma: assume $A$ is orthogonally similar to the $n\times n$ block matrix $A' = \begin{bmatrix}I_r & 0 \\0 & D\end{bmatrix}$, where $D$ is a diagonal matrix with $D_{ii}\neq D_{jj}\neq 1$ for all $i\neq j$. Then minres terminates in exactly $n-r+1$ iterations.
Proof: Given orthogonal $U$, which diagonalized $A$, we have
$$
\|Ax - y\|_2
=\| A U^T Ux - y\|_2
= \|UA U^T U x - U y\|_2
= \|A' x' - y'\|_2
$$
So both optimization problems are equivalent,in the sense that if $x_k = \arg\min_{\xi\in K_k(A)}\|A\xi-y\|$ then $x_k =U^Tx_k'$ where $x_k' = \arg\min_{\xi\in K_k(A')}\|A'\xi-y'\|$. And the early termination in the diagonal case follows by induction over the size of $D$.
Corollary: If $A$ is orthogonally diagonalizable and has exactly $r$ disjoint eigenvalues then minres terminates in exactly $r$ iterations. (in practice this is not guaranteed due to numerical error)
import numpy as np
from scipy.sparse.linalg import minres
from scipy.stats import ortho_group
create a random matrix of the specific form
N = 100
v = np.random.randn(N-1,1)
b = np.random.random(N)
A = np.block([[np.eye(N-1), v], [v.T, 0]])
run MINRES for 3 iterations
callback = lambda x: print(np.linalg.norm(A.dot(x)-b))
x, info = minres(A, b, tol=1e-15, callback=callback)
print("MINRES returned:", info)
print("The returnvalue 0 means that it has converged.")
orthogonal similarity transform
U = ortho_group.rvs(N)
A_dash = U.T @ A @ U
b_dash = U @ b
callback = lambda x: print(np.linalg.norm(A_dash.dot(x)-b_dash))
x, info = minres(A_dash, b_dash, tol=1e-15, callback=callback)
print("MINRES returned:", info)
print("The returnvalue 0 means that it has converged.")
4 disjoint EVs
U = ortho_group.rvs(N)
A = np.diag(np.concatenate([2np.ones(N//4), 3np.ones(N//4), -1np.ones(N//4), 10np.ones(N//4)]))
A_dash = U.T @ A @ U
b_dash = U @ b
callback = lambda x: print(np.linalg.norm(A_dash.dot(x)-b_dash))
x, info = minres(A_dash, b_dash, tol=1e-15, callback=callback)
print("MINRES returned:", info)
print("The returnvalue 0 means that it has converged.")