3

I am taking a Theory of Linear Models class and got the following assignment:

Solve the Least Squares Probem using Lagrange Multipliers.

We are in the 3rd week of this class so we have only covered the basics. I don't have much experience in this field and the problem statement is not too elaborate so I need some perspective. I have researched and all the information I can find is on Constrained Linear Models, some have what they call "linear constraints", like here:


enter image description here


Then again, it says "When we compare different models". Am I supposed to solve a problem with a linear constraint or is that something completely different? Any tips to get me started would be highly appreciated.

1 Answers1

7

Given

$$\begin{array}{ll} \text{minimize} & \| \mathrm A \mathrm x - \mathrm b \|_2^2\\ \text{subject to} & \mathrm C \mathrm x = \mathrm d\end{array}$$

we define the Lagrangian

$$\mathcal L (\mathrm x, \lambda) := \frac 12 \| \mathrm A \mathrm x - \mathrm b \|_2^2 + \lambda^{\top} \left( \mathrm C \mathrm x - \mathrm d \right)$$

Taking the partial derivatives and finding where they vanish, we obtain the following linear system

$$\begin{bmatrix} \mathrm A^{\top} \mathrm A & \mathrm C^{\top}\\ \mathrm C & \mathrm O\end{bmatrix} \begin{bmatrix} \mathrm x\\ \lambda\end{bmatrix} = \begin{bmatrix} \mathrm A^{\top} \mathrm b\\ \mathrm d\end{bmatrix}$$

  • 1
    Is there a version of it without forming the normal equation system? – pinpon Sep 11 '19 at 12:48
  • @pinpon One possibility would be to solve $\rm C x = d$, whose solution space is parameterized by $\mathrm x_{\ln} + \mathrm V \eta$, and then rewrite as an unconstrained least-squares problem in $\eta$. – Rodrigo de Azevedo Sep 12 '19 at 22:16
  • What do you mean by "is parameterized by xln+Vη"? what does the suffix "ln" stands for? – pinpon Sep 13 '19 at 11:48
  • @pinpon Yes, clarification is in order. The subscript "ln" stands for "least-norm". In other words, the solution space is a hyperplane, and the point on this hyperplane closest to the origin (in the Euclidean norm) is a special particular solution called the least-norm solution. The columns of $\rm V$ span the hyperplane when it is shifted so that it passes through the origin. Take a look at this. – Rodrigo de Azevedo Sep 13 '19 at 20:27
  • Thank you for your explanation. I am not still sure I understood the whole procedure. First I solve for $\rm C x = d$ taking the least norm solution. Then I solve the system $A V \eta= b - A x_{ln}$ for $\eta$, being $V$ the row space of $C$. Is that correct? – pinpon Sep 17 '19 at 01:59
  • Sorry being $V$ the null space of $C$ – pinpon Sep 17 '19 at 02:06
  • @pinpon Using SVD is the way to go. Once one has the least-norm solution and the null space of $\rm C$, then one can solve the unconstrained least-squares problem $$\text{minimize} \quad | \mathrm A \mathrm V \eta - \left( \mathrm b - \mathrm A \mathrm x_{\text{LN}} \right) |2^2$$ Taking the gradient and finding where it vanishes, one obtains the following normal equations $$\mathrm V^\top \mathrm A^\top \mathrm A \mathrm V \eta = \mathrm V^\top \mathrm A^\top \left( \mathrm b - \mathrm A \mathrm x{\text{LN}} \right)$$ – Rodrigo de Azevedo Sep 17 '19 at 06:29
  • 1
    Thank you again, now is clear! – pinpon Sep 24 '19 at 09:26
  • Why did you put to the Lagrangian function $\frac{1}{2}$? There is no $\frac{1}{2}$ in the minimazing function. Thank you for the answer in advance. – Laura Dec 06 '20 at 16:21
  • @Laura Because it is convenient and legal. Once differentiated, the quadratic term is left alone — no pesky $2$ multiplier cluttering things. – Rodrigo de Azevedo Dec 06 '20 at 16:30
  • Ok, but if I do not put the fraction there and will solve the problem without $\frac{1}{2}$ is it ok as well? – Laura Dec 06 '20 at 16:33
  • @Laura It's OK, but your Lagrange multiplier will be scaled (relative to mine). – Rodrigo de Azevedo Dec 06 '20 at 16:35