Conditioning of the linear systems in the inverse or Rayleigh quotient iteration algorithms

Question

I'm working through the book Numerical Linear Algebra by Trefethen and Bau. In Lecture 27 (and exercise 27.5), the following claim is made about the inverse iteration algorithm:

Let $ A $ be a real, symmetric matrix. Solving the system $ (A - \mu I) w = v^{(k-1)} $ at step $ k $ is poorly conditioned if $ \mu $ is approximately an eigenvalue of $ A $. However, this does not cause an issue for the inverse iteration algorithm if it is solved with a backward stable algorithm which outputs $ \tilde{w} $ such that $ (A - \mu I + \delta M) \tilde{w} = v^{(k-1)}$ where $ \frac{\|\delta M\|}{\|M\|} = O(\epsilon_\text{machine}) $. The reason is that even though $ w $ and $ \tilde{w} $ are not close, $ \frac{w}{\|w\|} $ and $ \frac{\tilde{w}}{\|\tilde{w}\|} $ are.

The same issue occurs in the Rayleigh quotient iteration where at each step $ \mu $ is updated with a more accurate estimate of an eigenvalue of $ A $.

I completely understand why the system is poorly conditioned when $ \mu $ is approximately an eigenvalue of $ A $. I am attempting to prove the remainder of the claim or at least understand why it should be true. Applying the definitions of backward stability and the condition of the problem don't lead anywhere beyond the usual bound for the accuracy: $ \frac{\|w - \tilde{w} \|}{\|w\|} = O(\kappa(A - \mu I) \epsilon_\text{machine}) = O(1) $ for $ \mu $ near an eigenvalue of $ A $. I suspect that I need to use the fact that $ A $ is normal to move forward, but I don't see how.

Any help is appreciated. Thanks!

Links to Wikipedia articles on 1. Inverse iteration 2. Rayleigh quotient iteration

Did you get to know the solution of the question? – newbie Dec 01 '18 at 05:00 — newbie, Dec 01 '18 at 05:00

Qi Yu · Answer 1 · 2020-08-21T00:33:26.283

I will give a rough idea. Consider $A=\begin{bmatrix}1 & 0\\0 & 10^{-15}\end{bmatrix}$, $v=(1,1)^T$. Then $A^{-1}v=(1,10^{15})^T$. Suppose $\delta A=\begin{bmatrix}0 & 0\\0 & 10^{-15}\end{bmatrix}$. Then $(A+\delta A)^{-1}v=(1,0.5\cdot 10^{15})$.

The idea is that in the small eigen-value space, the component got amplified big enough that any big change to it won't change the direction of the unit vector too much.

This idea holds when the component in the smallest eigen-value is not $O(\epsilon)$. In this case, it can cause large deviations of direction.

score 0 · Answer 2 · answered Mar 30 '21 at 04:17

I'm working on this exercise too so I'll share my answer here although I'm not 100% sure it's correct.

Let $A$ have eigenvalues $|\lambda_m|\ge \ldots \ge |\lambda_2|>>|\lambda_1|$. Let the corresponding eigenvectors be $$q_m,\ldots,q_1$$. From the problem statement, we have $w=A^{-1}v, \tilde{w}=(A+\delta A)^{-1}v$.

It is easy to show that $\frac{w}{||w||}=q_1(1+O(\frac{\lambda_1}{\lambda_2}))+q\cdot O(\frac{\lambda_1}{\lambda_2})$, where q is some vector.

For $\tilde{w}$, $(A+\delta A)^{-1} = (A(I+A^{-1}\delta A))^{-1} = (I+A^{-1}\delta A)^{-1}A^{-1}$. $$(I+A^{-1}\delta A)^{-1}=I-A^{-1}\delta A + (A^{-1}\delta A)^2-...$$ It is tempting to omit the terms starting from (A^{-1}\delta A)^2 as higher-order term of $O(\epsilon^2)$ but I'm not sure such approximation is justified, since the most straightforward upper bound on $||A^{-1}\delta A||$ would be $O(||A^{-1}||||A||\epsilon)=O(\kappa(A)\epsilon)$ which may not be small (this is the whole point of the problem...?). However, even we don't discard the "higher-order" terms, we still have all the terms of $(A(I+A^{-1}\delta A))^{-1}$ involves a multiplication of $A^{-1}$ on the left. And the resulting $\tilde{w}=(A+\delta A)^{-1}v$ is also $A^{-1}$ multiplying some vector. Assuming such vector also spans all the eigen components (especially $q_1$), it follows that $\frac{\tilde{w}}{||\tilde{w}||}=q_1(1+O(\frac{\lambda_1}{\lambda_2}))+q'\cdot O(\frac{\lambda_1}{\lambda_2})$, where $q'$ is some vector. Thus, the difference between $\frac{w}{||w||}$ and $\frac{\tilde{w}}{||\tilde{w}||}$ is $O(\frac{\lambda_1}{\lambda_2})$ (both relative and absolute, since they are of norm 1). $O(\frac{\lambda_1}{\lambda_2})$ technically may not in $O(\epsilon)$ but should be much better than a bound of $O(\kappa(A)\epsilon).$

Conditioning of the linear systems in the inverse or Rayleigh quotient iteration algorithms

2 Answers2