Can the Gram-Schmidt projection be thought of as finding the nearest point in the subspace of orthogonal matrices, with a suitable distance?

Question

Let $A \in M_{n \times n}(\mathbb(R))$. If we apply Gram-Schmidt to $A$ we get a matrix $A' \in O_n(\mathbb{R})$. We could also try to solve the optimization problem $\min_{X \in O_n} d(A,X)$, where $d$ is some reasonable metric.

Question: Is there a choice of metric (or divergence) $d$ such that these two notions coincide?

Notes:

For the Frobenius norm, I know that if $A = U \Sigma V^T$ is the SVD, then the solution to the minimization problem is given by $ UV^T$ ( https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem ).
For the Frobenius distance minimization, the natural connection is to the polar decomposition: $S = U \Sigma U^T$ and $P = U V^T$, then $SP =U \Sigma U^T U V^T = U \Sigma V^T = A$. Alternatively, we could have used $L = V \Sigma V^T$, and $W = UV^T$, to write $A = WL$. So, the rotation part of the polar decomposition of $A$ is the Frobenius norm nearest point in $O(n)$.
If we write $A = QR$, where $Q$ is orthogonal and $R$ is non-negative upper triangular (the $QR$-decomposition), then this is unique and $A' = Q$. Based on this, we would like the metric to satisfy that $d(X, QR)$ is minimized by $X = Q$. Assuming $d$ is invariant under orthogonal transformations, this gives $d( Q^{-1} X, R)$, so we want $d(X,R)$ to be minimized by $X = I$ whenever $R$ is a non-negative upper triangular matrix and $X$ is orthogonal. If $\mathcal{R}_n$ denotes the non-negative upper triangular matrices, then $\mathcal{R}_n \cap O(n) = \{I\}$ (otherwise there would obviously be no such metric). It seems like we'd want a metric $d$ where $O(n)$ touches $\mathcal{R}_n$ "d-orthogonally" at $I$. The Frobenius norm does not work, since if $R = [A, [1,10],[0,1]]$ then $d_{Frob} (R,I) = 10$ but $d_{Frob} (R, [ [0,1],[1,0]]) = \sqrt{84}$. This calculation is not surprising because it is rare for $QR$ and polar decompositions to coincide -- that only happens if the PSD part (or the upper triangular part) is diagonal.
A reasonable thing to do next is to numerically try other common matrix norms. There is some discussion of rotation invariant norms here: Characterizing orthogonally-invariant norms on the space of matrices .
This https://arxiv.org/pdf/1601.01875.pdf seems sorta related.

overfull hbox · Answer 1 · 2019-12-06T16:13:41.710

1

By applying GS to $A$ we get an orthonormal set $A'$ where the span of the first $i$ columns of $A$ and $A'$ are the same.

Naively you could do something like checking that the $i$-th column of $A$ is in the span of the first $i$ columns of $X$. That is, $$ d(A,X) = \sum_{i=1}^{n} \| (I - (X_{1:i})(X_{1:i})^T)A_i \| $$ where $A_i$ is the $i$th column of and $A$ and $X_{1:i}$ is the matrix of columns $1,2,\ldots,i$.

This isn't a metric in the "metric space" sense because it isn't symmetric, but you could symmetrize it. Then you would have to see if it is subadditive (maybe with the right norm it is). I'm not really sure this counts as "nice" though.

If you assume that $d$ is invariant under unitary transforms then you're just looking for an orthogonal matrix with the same span as $A$, so you can minimize,

$$ d(A,X) = \| AA^+ - XX^+\| = \|A(A^TA)^{-1}A^T - XX(X^TX)^{-1}X^T\| $$ where $A^+$ is the pseudoinverse. If $X$ is unitary, then $XX^+ = XX^T$.

edited Dec 06 '19 at 16:13

answered Dec 06 '19 at 16:04

overfull hbox

1,430

1

That's an interesting idea -- but A' is more than an orthonormal matrix where the span of the first i columns agree, since there's also the orientations to take into account somehow. I'm fine with it not being a metric, though -- anything like KL-divergence would make me happy (a Bregman divergence). – Elle Najt Dec 06 '19 at 18:57
What do you mean orientations? – overfull hbox Dec 06 '19 at 19:36
For example I mean that there are two choices of orthonormal basis of a line. The one given by GS goes in the direction of the original vector. – Elle Najt Dec 07 '19 at 03:51
I see, that is true. – overfull hbox Dec 07 '19 at 16:47

Can the Gram-Schmidt projection be thought of as finding the nearest point in the subspace of orthogonal matrices, with a suitable distance?

1 Answers1