1

One column vector $a_k$ of a full rank matrix $A \in \mathbb{R}^{m\times n}$ is replaced by some other vector $a'_k$ leading to $A'$. This can increase/decrease the smallest/largest singular value of $A'$ compared to $A$.

  • If replacement increases the smallest singular value, how does $a'_k$ relate geometrically to the columns of $A$?
  • If replacement decreases the largest singular value, how does $a'_k$ relate geometrically to the columns of $A$?

Background

Let $A \in \mathbb{R}^{m\times n}$ with $m\leq n$ be a matrix of full rank, then its smallest singular value $$\sigma_{min}(A) > 0$$ as the number of non-zero singular values of A is equal to $rank(A)$. Also, the column vectors $a_i \in \mathbb{R}^m$ of $A=[a_1, …, a_n]$ span $\mathbb{R}^m$.

If I change one column $a_k$ of $a$ with $k \in {1, …, n}$ to some other $a'_k$, I get a new matrix $$A' = [a_1, …, a_{k-1}, a'_k, a_{k+1},…, a_n].$$ This changing of vectors can of course be repeated and $a'_k$ or any other column of $A'$ replaced by some new vector $a''_i$.

If one or more columns of $A$ are changed such that $A'^{\cdots}{}'$ becomes rank-deficient and its smallest singular value turns to zero, the columns of the changed matrix no longer span $\mathbb{R}^m$ whereas the kernel of $A$ "gains one dimension".

It appears to me that that implies that the column vectors of $A'^{\cdots}{}'$ are all points in an $(m-1)$-dimensional plane in $\mathbb{R}^m$ that includes the origin. So as the smallest singular value of the matrix decreases, I assume the point cloud of its column vectors becomes "flatter".

Questions

  • So it appears to me that if the singular value increases by replacing $a_k$ by $a'_k$, the point cloud of the column vectors becomes "less flat" and the columns of $A'$ (excuse my expression) "span $\mathbb{R}^m$ better". Is that true? And if so, how do you say "span better" mathematically? Cause they either span or don't span, there is not really a "span better or worse"... So the question is: If the smallest singular value had increased by replacing one column, what kind of move/change in the point cloud is associated with it?

  • Related to that: The condition number of solving a linear system of equations $Ax=b$ is $\kappa = \frac{\sigma_{max}(A)}{\sigma_{min}(A)}$. To have a well-conditioned problem, the $\kappa$ is desired to be small. Does that somehow mean that the point cloud needs to "become less flat" in order to decrease the maximum singular value of the matrix by replacing the column?

Research

The change of singular values when adding vectors have been discussed e.g. here and here. However I am not wondering about adding a vector to $A$, but replacing a vector in $A$ maintaining the original size of $A$.

yavagi
  • 98

1 Answers1

1

In general, we can factor the map $f$ associated to the matrix $A$ as $jg\pi$, where:

  • $\pi : \mathbb{R}^n \to \mathbb{R}^k$ is a projection, that is the quotient by the kernel;
  • $g: \mathbb{R}^k\to \mathbb{R}^k$ is associated to a square matrix $B$;
  • $j: \mathbb{R}^k \to \mathbb{R}^m$ is conjugate to the canonical injection.

Now the singular values of $A$ are equal to the singular values of $B$, plus a bunch of $n-k$ zeros coming from the kernel. The good news is that now $B$ is an ordinary invertibile matrix, and we can explore its geometric properties. Let us call $\phi(B) $ the flattening parameter for a matrix $B \in GL_k$. We would like this parameter to respect:

  • Invariance under conjugation, since we want the flattening to depend only on the abstract map $g$;

  • Continuity, so that two matrices which are very close have very close flattening number;

  • If we multiply by an orthogonal matrix on the left, the flattening number remains the same; this is because we are applying an orthogonal transformation to all the vectors in the point cloud, without changing its geometry;

  • If the matrix is diagonal, then $\phi(B) = (\sigma_{max}(B) /\sigma_{min}(B) )^{1/(n-1)}$. Since the point cloud here represents a "cubical" block, this is a good measure of how much 'flat' is the block. If it is big, the block will be very flat. Indeed, the geometric mean of the $n-1$ quotients ($\sigma_k $ is the k-th bigger eigenvalue, with $\sigma_1$ being the smallest) :

$$ \frac{\sigma_n(B) }{\sigma_{n-1}(B) } , \ldots, \frac{\sigma_2(B) }{\sigma_1(B) }$$

is exactly $\phi(B) $, so at least one of them - say $\sigma_{r+1}(B)/\sigma_r(B)$ is bigger than $\phi(B) $. This means that the face with edges $\sigma_{r+1}(B) $ and $\sigma_r(B) $ has one big edge and one small edge.

Let us consider the general case. Consider the Jordan form $J$ of the matrix $B$. By the first hypothesis, $\phi(J) = \phi(B) $. If J was diagonal, we should conclude by the fourth hypothesis that $\phi(J) = (\sigma_{max}(J) /\sigma_{min}(J) )^{1/(n-1)}$. Unfortunately, it's not. By multiplying by the right orthogonal transformations, we can suppose that $J$ is triangular with real entries, where complex values are substituted by its absolute values. For example, if we have a real Jordan block corresponding to $\pm i$, this is just a rotation of 90 degrees.

Now we do the last trick. Consider the matrix $D_{\varepsilon}= \text{diag}(1, \varepsilon, \ldots, \varepsilon^{n-1}) $. You can show that $J_{\varepsilon}:= D^{-1}_{\varepsilon} J D_{\varepsilon}$ is the same as a Jordan form, but all epsilons over the diagonal. By the first hypothesis this has the same flattening number as $J$. By continuity, we have that

$$ \phi(J) = \lim_{\varepsilon \to 0} \phi(J_{\varepsilon}) = \phi( \lim_{\varepsilon \to 0} J_{\varepsilon} ) = \phi(D) $$

Where $D$ is the diagonal part of the Jordan form. We conclude that $$\phi(B) = \phi(D) = (\sigma_{max}(D) /\sigma_{min}(D)) ^{n-1} = (\sigma_{max}(B) /\sigma_{min}(B)) ^{n-1} $$

that is, $k(B) = \phi(B) ^{n-1}$. I think this answers your question on how to formalize flatness and why flat stuff behaves badly from the conditioning point of view.