Is there some computationally efficient way of computing $(A^\top A)^{-1}$? I would like to know this cause I want to compute the projection matrix $$ P = A(A^\top A)^{-1} A^\top $$ efficiently.
1 Answers
I'm going to suggest an indirect method which, while slightly longer, is more accurate.
Suppose $A$ is $n\times m$ with $n > m$ has full column rank. As a rule of thumb, you can usually get away without computing with the matrix $A^\top A$ in matrix computations, and this is advisable if possible.
You can reformulate your goal as finding an orthonormal basis $Q\in\mathbb{R}^{n\times m}$ for the column space of $A$. One can then compute matrix-vector multiplications with the projection $\Pi_A = A(A^\top A)^{-1}A^\top = QQ^\top$ by $4mn$ operations by evaluation $\Pi_Ax = Q(Q^\top x)$.
The most efficient and accurate way to do this is by QR factorization, which under the hood is implemented with Householder reflectors and is very accurate. You want to use economy QR factorization $A=QR$ which computes $Q\in \mathbb{R}^{n\times m}$ with orthonormal columns rather than full QR factorization $A = Q'R'$ which computes a full $n\times n$ matrix $Q'\in\mathbb{R}^{n\times n}$. In MATLAB, this is done by the call [Q,~] = qr(A,0).
Let's see how this works. Note that $R$ is nonsingular. Thus,
$$ \Pi_A = A(A^\top A)^{-1}A^\top = QR(R^\top R)^{-1}R^\top Q^\top = QQ^\top, $$
as promised.
If $A$ is rank-deficient (or approximately rank-deficient), you will want to use a SVD and threshold out small singular values to compute an orthonormal basis for the (approximate) column space. I can elaborate on this point in the comments if desired.
Why is this worth the hassle? Consider the following MATLAB example
>> Q_true = randn(10000,100); [Q_true,~] = qr(Q_true, 0);
>> A = Q_true * gallery('randsvd',100,1e10);
>> x = randn(10000,1);
>> tic; disp(norm(A*((A'*A)\(A'*x)) - Q_true*(Q_true'*x))); toc
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =
1.095993e-20.
2.299635478974520e+01
Elapsed time is 0.003516 seconds.
>> tic; [Q,~] = qr(A,0); disp(norm(Q(Q'x) - Q_true(Q_true'x))); toc
1.177844914088478e-06
Elapsed time is 0.021228 seconds.
The method based on QR factorization is like eight times slower, which isn't great. But the error for the QR-based method is ten million times smaller than for the $(A^\top A)^{-1}$ method!
If you can afford it and $A$ is even moderately ill-conditioned, the QR factorization method pays for its modestly increased cost with dramatically increased accuracy.
As Ben Grossmann suggests, iterative methods may be appropriate of $A$ is well-conditioned and $n$ and $m$ are both large.
- 7,712
-
Thank you! This is very helpful! Do you have any resource to learn about this area of linear algebra? I took 2 linear algebra courses at university but honestly we skimmed very very briefly matrix decompositions and basically didn't really cover projections in depth. – Euler_Salter Jun 23 '21 at 22:33
-
1@Euler_Salter The books by Golub and Van Loan, Trefethan and Bau, and Demmel are all excellent. My personal philosophy about matrix computations is that they're kind of like cooking: there are a number of fundamental techniques (analogous to knife skills, boiling, braising, etc.) from which you can solve most problems. You get a lot of mileage out of LU, QR, eigenvalue problems, SVD, and conjugate gradient. – eepperly16 Jun 24 '21 at 01:39
-
1
-
Let $A'$ be $A$ with an extra vector $a$ appended. Is there an efficient way to compute $\Pi_{A'}$ from $\Pi_A$? Something like the matrix determinant lemma or Sherman–Morrison formula? – user76284 Oct 15 '22 at 04:58
-
A*(A'*A)\(A'*x)leads to solution using the Cholesky decomposition for your case, so I suspect that for most usage cases that'll be "good enough". – Ben Grossmann Jun 22 '21 at 15:02