Does a solution with a minimal norm mean it is a solution that minimizes the residuals?

Question

I have an under-determined system of linear equations, which is not full rank (i.e. one or more column vectors are linearly dependent) and has more equations than unknowns.
Here is just one example of this:

$$ {\begin{bmatrix} 6 & 40 \\ 6 & 40 \\ 3 & 20 \\ \end{bmatrix}} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} \approx \begin{bmatrix} 0.5 \\ 0.2 \\ 0.6 \\ \end{bmatrix} $$

If I solve the above example for the least squares solution (i.e. solve $A\vec{x}\approx\vec{b}$), the solution produced will come from a set of solutions $S$ that is infinite in length (i.e. $|S| = \infty$). However, is it possible to minimize the residuals (i.e. get the best model that minimizes the error) from this infinite set of solutions $S$?

This question is sparked from a response I got on StackOverflow (SO) and from what I read on this forum. For example, on my SO post, user @AGN Gazer says the following:

"A solution with a minimal norm does not mean a solution that minimizes residuals. [...] Having a solution with a minimal norm means nothing to me[.] [...] I want the "best" solution - the one that minimizes the residuals but I cannot get it with an underdetermined system."

However, on the accepted answer on this Math Exchange post, it appears that @Brian Borchers says otherwise:

"[We're] often interested in the minimum norm least squares solution. That is, among the infinitely many least squares solutions, pick out the least squares solution with the smallest $∥x∥_{2}$ [(i.e. euclidean norm: $\| x \|_{2} =\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$)]. The minimum norm least squares solution is always unique."

I must be misunderstanding something! In relation to my question above, what even is the distinction between minimizing the residuals and minimizing the norm?

@RodrigodeAzevedo I thought that too, but was told that is incorrect. If you have more equations than unknowns, but one or more of the column vectors is linearly dependent, then it is no longer full rank is thereby underdetermined. — Dannon, Jun 22 '18 at 19:45
@RodrigodeAzevedo Take a look at my SO post above, which gives an example of this! I am now extremely confused by all the conflicting information I am getting. — Dannon, Jun 22 '18 at 19:48
Least-squares minimizes $| A x - b |_2^2$. Least-norm minimizes $|x|_2^2$ subject to $A x = b$. Combining the two, minimize $| A x - b |_2^2 + \gamma |x|_2^2$ with $\gamma > 0$. What exactly do you want? — Rodrigo de Azevedo, Jun 22 '18 at 19:57
If I use ordinary least squares (OLS) on the system I described above, can I minimize the residuals? Are you saying that minimizing the norm and minimizing the residuals are two separate operations, with minimizing the norms meaning finding the smallest norm for the parameters and minimizing the residuals meaning finding the smallest error of the output? — Dannon, Jun 22 '18 at 20:04
Does $A x = b$ have zero or infinitely many solutions? If zero solutions, use least-squares. If infinitely many, use least-norm. However, the fact that $A$ does not have full column rank means that there will be infinitely many $x$'s that produce minimal $| A x -b |_2^2$. — Rodrigo de Azevedo, Jun 22 '18 at 20:05
So you are saying that I can minimize the residuals with under-determined systems? I want the model that produces the highest $R^2$ and thereby minimum residuals. — Dannon, Jun 22 '18 at 20:06
Least-norm is about finding the shortest input $x$ that solves $A x = b$ (zero error). Least-squares is about finding the input $x$ that minimizes the output error $| A x - b |_2^2$. — Rodrigo de Azevedo, Jun 22 '18 at 20:07
@RodrigodeAzevedo So the user on SO that said the above comment is grossly incorrect? — Dannon, Jun 22 '18 at 20:07
What solution? Does $A x = b$ even have a solution? Note that in the context of least-squares, one has the original linear system $A x = b$ and the normal equations $A^\top A x = A^\top b$ (a linear system that always has at least one solution). Please edit your question and specify what exactly you want. Quoting other people only adds noise to the discussion. — Rodrigo de Azevedo, Jun 22 '18 at 20:12
@RodrigodeAzevedo I quoted people because I believe they are two contradicting answers to the same question. I have stated my question multiple times in the question above and even in the comments. In my question I said "I will be pulling from a set of solutions that is infinite in length." So yes, it has many solutions. Did you read my entire question instead of just skimming it? — Dannon, Jun 22 '18 at 20:16
Which set is that? The solution set of $A x = b$ or the solution set of $A^\top A x = A^\top b$ ? — Rodrigo de Azevedo, Jun 22 '18 at 20:22
@RodrigodeAzevedo Take a look at my question above. I have re-edited it. — Dannon, Jun 22 '18 at 20:41
The example you gave ($3 \times 2$) is infeasible. The set of solutions is empty. — Rodrigo de Azevedo, Jun 22 '18 at 20:56
@RodrigodeAzevedo What? I am finding a plane that goes through this data. To say no solutions exists is impossible because I can plot these equations and draw a plane at random. There is a solution! If you use any least squares solver, you WILL get a solution. I heavily disagree with you. — Dannon, Jun 22 '18 at 21:02
Subtract the first two equations and you get $0 = 0.3$. Infeasible. The good news is that least-norm is no longer an option. All least-squares solutions (and there are infinitely many of them) produce the same error norm and none of them solve the original system, of course. — Rodrigo de Azevedo, Jun 22 '18 at 21:04
You have a severe misunderstanding. A solution is an assignment of values that satisfies the equations. There is no assignment of values to $\beta_0$ and $\beta_1$ that satisfies all three equations, so there is no solution. You cannot take an arbitrary assignment of values at random and say it is a solution. Otherwise, I could solve any complicated equation just by picking some random numbers. — , Jun 22 '18 at 23:07

score 2 · Answer 1 · answered Jun 22 '18 at 22:58

In your example, there is no exact solution of $Ax = b$. Because $A$ is not full rank, the least squares solution is not unique.

The minimum 2-norm least squares solution is that least squares solution for which $\|x\|_2$ is minimum among those least squares solutions. I.e., consider all least squares solutions achieving the same sum minimum squared residuals value, $(Ax-b)^T(Ax-b)$, then minimizing 2-norm of x among those solutions can serve as a tie-breaker to choose from among the solutions achieving the minimum sun squared residuals value. The minimum 2-norm solution can be found using the pseudoinvserse (pinv in MATLAB) of $A$, as shown below. Note that if $A$ is full rank, then the least squares solution is unique, and therefore, is the minimum 2-norm least squares solution.

Here is an illustration in MATLAB on your example. As can be seen, in this example, the QR and SVD (minimum 2-norm) solutions have the same residuals, but $\|x||_2$ is smaller for the SVD solution than the QR solution.

>> disp(A)
     6    40
     6    40
     3    20
>> disp(b)
   0.500000000000000
   0.200000000000000
   0.600000000000000
>> QR_solution = A\b
Warning: Rank deficient, rank = 1, tol =  2.307555e-14. 
QR_solution =
                   0
   0.011111111111111
>> A*QR_solution - b
ans =
  -0.055555555555556
   0.244444444444444
  -0.377777777777778
>> norm(QR_solution)
ans =
   0.011111111111111
>> SVD_solution = pinv(A)*b
SVD_solution =
   0.001629991850041
   0.010866612333605
>> A*SVD_solution - b
ans =
  -0.055555555555556
   0.244444444444444
  -0.377777777777778
>> norm(SVD_solution)
ans =
   0.010988181698537

Does a solution with a minimal norm mean it is a solution that minimizes the residuals?

1 Answers1