Least squares — minimum norm estimate

Question

I have been studying linear observation models and least squares estimation and I came across this problem that requires some knowledge about linear algebra and vector spaces.

Let the observation model be $y=A\theta+v$ and $\hat{\theta}$ be some least squares solution. Then $$S = \left\{ \theta = \hat{\theta} + \omega : \omega \in N(A) \right \}$$ is the set of all LS-solutions.

prove that $\hat{\theta} \perp N(A) \Rightarrow$ ${\hat\theta}$ is the minimum norm solution.

let the observation model be $y_{i}=\theta_{1}x_{i}+\theta_{2}+v_{i}$, $1\leq i\leq n$ and let two observations be $(1,1)$ and $(2,1)$. Write the observation model in matrix form and prove that $S=\left\{ (\theta_{1},\theta_{2})\in\mathbb{R}^2 : \theta_{1}+\theta_{2}=\frac{3}{2} \right \}$ is the set of all LS-solutions.

Solve $N(A)$ and draw $S$ and $N(A)$ in $(\theta_{1},\theta_{2})$ coordinates. Use the picture to determine the minimum norm estimate in this case.

I) In this case the number of unknown parameters is $p=2$ and $A\in \mathbb{R}^{n\times2}$. I think $N(A)$ is referring to the null space of $A$ (i.e. the set $\{\tilde{x}\in\mathbb{R}^{2} :A\tilde{x}=0\}$ and $\omega$ represents the 'error term' in some sense. I was able to prove that $\hat{\theta}=(A^{T}A)^{-1}A^Ty$ is the mimimum norm solution by proving that $\hat{\theta}$ does minimize the norm of residual $\left \|r \right \|=\left \|y-A\hat{\theta} \right \|$. However, I don't think that's what the person who posed the question meant as I haven't actually used the fact that $\hat{\theta} \perp N(A)$. Basically, $H\hat\theta$ is the projection of $y$ into the range of $A$

II) This is a piece of cake. In matrix form

$\begin{bmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{n}\\ \end{bmatrix} = \begin{bmatrix} x_{1}& 1\\ x_{2}&1\\ \vdots\\x_{n}&1 \end{bmatrix} \begin{bmatrix} \theta_{1}\\\theta_{2}\\ \end{bmatrix}+\begin{bmatrix} v_{1}\\ \vdots \\ v_{n}\\ \end{bmatrix}$

Clearly we have overdetermined system of equations (more rows than columns). How would one prove that $S$ is the set of all LS-solutions? I am really bad at linear algebra, so this problem is giving me trouble. Now $A=\begin{bmatrix} 1& 1\\ 1&1\\\end{bmatrix}$,$y=\begin{bmatrix} 2\\ 1\\\end{bmatrix}$ and $A^TA$ is not invertible.

Related: https://math.stackexchange.com/q/3451272/339790 – Rodrigo de Azevedo Oct 08 '20 at 12:10 — Rodrigo de Azevedo, Oct 08 '20 at 12:10

score 1 · Answer 1 · answered Dec 05 '17 at 23:28

Lets consider a more general case where the columns of $A$ are not independent. We can then partion $A$ as $$A=\begin{bmatrix} A_1 && A_2 \end{bmatrix}$$ where $A_1$ is full column rank, and $A_2$ are linear combinations of the columns of $A_1$, so we can write:$$ A_2=A_1X$$ the Least squares solution can then be expressed as: $$\min_{\theta_1,\theta_2} \vert\vert \begin{bmatrix}A_1 && A_2 \end{bmatrix}\begin{bmatrix}\theta_1 \\ \theta_2\end{bmatrix}-y \vert\vert_2^2 = \min_{\theta_1,\theta_2} \vert\vert A_1(\theta_1+X\theta_2)-y \vert\vert_2^2$$ We introduce an auxillary variable $\zeta=\theta_1+X\theta_2 $ . Because in the minimization argument $A_1$ now has full collumn rank, the optimal solution for $\zeta$ is:$$\hat\zeta=\hat\theta_1+X\hat\theta_2 =(A_1^TA_1)^{-1}A_1^Ty $$ any value for $\theta_2$ fixes the value for $\theta_1$ at $\theta_1=\hat\zeta-X\theta_2$, showing that there are infinitely many solutions. So in III they impose another constraint upon $\theta$ to make in unique, i.e. $\min\vert\vert\theta\vert\vert^2_2$

Oh yeah, for 1. I think it could be usefull, to use the SVD of A to prove that $\hat\theta_{opt}\perp N(A)$ , because the SVD naturally shows the null spaces. If you need help with it, let me know.

I really think you should show this all the way through as a reference. — Royi, May 25 '18 at 16:11

Least squares — minimum norm estimate

1 Answers1

Linked