I have an under-determined system of linear equations, which is not full rank (i.e. one or more column vectors are linearly dependent) and has more equations than unknowns.
Here is just one example of this:
$$ {\begin{bmatrix} 6 & 40 \\ 6 & 40 \\ 3 & 20 \\ \end{bmatrix}} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} \approx \begin{bmatrix} 0.5 \\ 0.2 \\ 0.6 \\ \end{bmatrix} $$
If I solve the above example for the least squares solution (i.e. solve $A\vec{x}\approx\vec{b}$), the solution produced will come from a set of solutions $S$ that is infinite in length (i.e. $|S| = \infty$). However, is it possible to minimize the residuals (i.e. get the best model that minimizes the error) from this infinite set of solutions $S$?
This question is sparked from a response I got on StackOverflow (SO) and from what I read on this forum. For example, on my SO post, user @AGN Gazer says the following:
"A solution with a minimal norm does not mean a solution that minimizes residuals. [...] Having a solution with a minimal norm means nothing to me[.] [...] I want the "best" solution - the one that minimizes the residuals but I cannot get it with an underdetermined system."
However, on the accepted answer on this Math Exchange post, it appears that @Brian Borchers says otherwise:
"[We're] often interested in the minimum norm least squares solution. That is, among the infinitely many least squares solutions, pick out the least squares solution with the smallest $∥x∥_{2}$ [(i.e. euclidean norm: $\| x \|_{2} =\sqrt{\sum_{i=1}^{n}x_{i}^{2}}$)]. The minimum norm least squares solution is always unique."
I must be misunderstanding something! In relation to my question above, what even is the distinction between minimizing the residuals and minimizing the norm?
ordinary least squares(OLS) on the system I described above, can I minimize the residuals? Are you saying that minimizing the norm and minimizing the residuals are two separate operations, with minimizing the norms meaning finding the smallest norm for the parameters and minimizing the residuals meaning finding the smallest error of the output? – Dannon Jun 22 '18 at 20:04