6

Is there some computationally efficient way of computing $(A^\top A)^{-1}$? I would like to know this cause I want to compute the projection matrix $$ P = A(A^\top A)^{-1} A^\top $$ efficiently.

Euler_Salter
  • 5,547
  • @MorganRodgers Not sure what your point is - here $A$ must be invertible because $A^\top A$ is invertible. – David C. Ullrich Jun 22 '21 at 11:14
  • @MorganRodgers Right. I was assuming for no reason that $A$ is square (in which case what I said does follow)\ – David C. Ullrich Jun 22 '21 at 11:16
  • 1
    @Euler_Salter I would bet that the downvotes are because you have asked a question with no context (e.g. motivation or attempts). If you need a more comprehensive list of of what constitutes helpful context for a question, see this post. – Ben Grossmann Jun 22 '21 at 14:07
  • 1
    @Euler_Salter Is there a reason that you need to be particularly "efficient" in your computation of $(A^TA)^{-1}$? Is this a computation on the computer or by hand? If this is a computation on the computer, then how large is $A$? With all that said, $A^TA$ will be symmetric and positive definite, so you might find the discussion on this post to be interesting. – Ben Grossmann Jun 22 '21 at 14:19
  • 1
    @BenGrossmann I will be using a computer. Essentially at a point on a manifold, I need to compute the projection matrix that will project a point onto the tangent space. In my particular case $A$ is basically a matrix whose columns are span the tangent space. I will need to do this projection many thousand times so I would like this to be very efficient – Euler_Salter Jun 22 '21 at 14:22
  • 1
    @Euler_Salter How large is $A$? That is, how many rows/columns? – Ben Grossmann Jun 22 '21 at 14:33
  • Honestly it could be any size. But in practice it would have $n$ rows and $m$ columns with $n > m$ – Euler_Salter Jun 22 '21 at 14:39
  • @Euler If you care about efficiency, then presumably you have some kind of "numerical experiment" in mind; otherwise, why would it matter how long it takes to find the inverse? For the purposes of this experiment, what is $m$? What is $n$? Is $n$ approximately $10$? $100$? $1000000$? – Ben Grossmann Jun 22 '21 at 14:42
  • 3
    @Euler Also, if the purpose of finding the projection matrix is to project a single vector (i.e. to compute $Px$ for a specific vector $x$), then you might want to avoid computing the inverse at all. For instance, you could use the conjugate gradient method. – Ben Grossmann Jun 22 '21 at 14:44
  • 1
    @BenGrossmann good point. At the moment I am working on toy examples with both $n$ and $m$ less than $10$, but this is just because they are toy examples. In practice it will depend on the problem at hand, but I would be happy for this to be efficient for hundreds of dimensions – Euler_Salter Jun 22 '21 at 14:46
  • 1
    @BenGrossmann Thank you. So the conjugate gradient method in this case would solve for $y$ this $$ A (A^\top A)^{-1} A^\top x = y \implies A^\top x = A^\top y ? $$ – Euler_Salter Jun 22 '21 at 14:49
  • 2
    @Euler_Salter I'm not quite sure what you're saying there; here's what I had in mind. Compute $y = A^Tx$. Then use the CG method to solve $(A^TA)z = y$ (so that $z = (A^TA)^{-1}A^Tx$). Then compute $Az$ to get $Px$. – Ben Grossmann Jun 22 '21 at 14:52
  • Ooh that's smart – Euler_Salter Jun 22 '21 at 14:55
  • 2
    @Euler_Salter That said, Matlab is generally as good as it gets when it comes to matrix computations. It seems that entering A*(A'*A)\(A'*x) leads to solution using the Cholesky decomposition for your case, so I suspect that for most usage cases that'll be "good enough". – Ben Grossmann Jun 22 '21 at 15:02

1 Answers1

9

I'm going to suggest an indirect method which, while slightly longer, is more accurate.

Suppose $A$ is $n\times m$ with $n > m$ has full column rank. As a rule of thumb, you can usually get away without computing with the matrix $A^\top A$ in matrix computations, and this is advisable if possible.

You can reformulate your goal as finding an orthonormal basis $Q\in\mathbb{R}^{n\times m}$ for the column space of $A$. One can then compute matrix-vector multiplications with the projection $\Pi_A = A(A^\top A)^{-1}A^\top = QQ^\top$ by $4mn$ operations by evaluation $\Pi_Ax = Q(Q^\top x)$.

The most efficient and accurate way to do this is by QR factorization, which under the hood is implemented with Householder reflectors and is very accurate. You want to use economy QR factorization $A=QR$ which computes $Q\in \mathbb{R}^{n\times m}$ with orthonormal columns rather than full QR factorization $A = Q'R'$ which computes a full $n\times n$ matrix $Q'\in\mathbb{R}^{n\times n}$. In MATLAB, this is done by the call [Q,~] = qr(A,0).

Let's see how this works. Note that $R$ is nonsingular. Thus,

$$ \Pi_A = A(A^\top A)^{-1}A^\top = QR(R^\top R)^{-1}R^\top Q^\top = QQ^\top, $$

as promised.

If $A$ is rank-deficient (or approximately rank-deficient), you will want to use a SVD and threshold out small singular values to compute an orthonormal basis for the (approximate) column space. I can elaborate on this point in the comments if desired.

Why is this worth the hassle? Consider the following MATLAB example

>> Q_true = randn(10000,100); [Q_true,~] = qr(Q_true, 0);
>> A = Q_true * gallery('randsvd',100,1e10);
>> x = randn(10000,1);
>> tic; disp(norm(A*((A'*A)\(A'*x)) - Q_true*(Q_true'*x))); toc
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND =
1.095993e-20.
 2.299635478974520e+01

Elapsed time is 0.003516 seconds. >> tic; [Q,~] = qr(A,0); disp(norm(Q(Q'x) - Q_true(Q_true'x))); toc 1.177844914088478e-06

Elapsed time is 0.021228 seconds.

The method based on QR factorization is like eight times slower, which isn't great. But the error for the QR-based method is ten million times smaller than for the $(A^\top A)^{-1}$ method!

If you can afford it and $A$ is even moderately ill-conditioned, the QR factorization method pays for its modestly increased cost with dramatically increased accuracy.

As Ben Grossmann suggests, iterative methods may be appropriate of $A$ is well-conditioned and $n$ and $m$ are both large.

eepperly16
  • 7,712