6

Consider the simple optimization problem $$\min_x \|b-Ax\|_2 \ \ \ \ \ \ \text{subject to} \ \ \ \ \ \ \|x\| = 1$$ where $b\in\mathbb{R}^{N}$, $A\in\mathbb{R}^{N\times 3}$, $x\in\mathbb{R}^{3}$ with $N > 3$ (overdetermined system). I need to solve this millions of times, so computational efficiency is important.

I am aware of the following aspects:

  • The norm equality constraint $\|x\| = 1$ makes the problem non-convex.
  • A relaxation to $\|x\| \leq 1$ leads to a convex optimization problem.
  • I can employ a two-dimensional parametrization of $\mathbb{R}^{3}$ unit vector $x$, e.g. azimuth angle $\phi$ and polar angle $\theta$, and use a gradient search (or similar) to solve $\min_{\phi,\theta} \|b-Ax(\phi,\theta)\|_2$ instead.
  • The problem can be written in quadratic fashion $$\min_x \ x^\text{T}A^\text{T}Ax - 2b^\text{T}Ax \ \ \ \ \ \ \text{s.t.} \ \ \ \ \ \ x^\text{T}x = 1 \ .$$

My questions are:

  1. Is there a super smart and super efficient way to solve my problem? If yes, a minimal Matlab code example that does so would be amazing.
  2. Is "quadratic progamming" eligible?
  3. Does a relaxation to $\|x\| \geq 1$ also lead to a convex problem? Would it be meaningful to first solve s.t. $\|x\| \leq 1$ and then s.t. $\|x\| \geq 1$ in hopes that one of the two solutions fulfills $\|x\| = 1$?
  4. A minimal Matlab example for the relaxed problem s.t. $\|x\| \leq 1$ would be amazing (this should be easy, but I never used CVX or similar).
  5. I found that a normalized pseudo-inverse solution, i.e. $x \approx y\,/\,\|y\|$ with $y = (A^\text{T}A)^{-1}A^\text{T} b$, is a decent approximation quite often. This can be the case even if $\|y\|$ is far off $1$. Can you identify a condition for this approximation being accurate/meaningful?

Info: I never really studied the fine arts of convex optimization and linear/quadratic programming. So, when writing your response, please do not omit details that might be obvious to experts.

Thank you very much!

Royi
  • 10,050
GDumphart
  • 2,300
  • Have you considered doing this as a Lagrange multiplier problem? It would be a little easier to minimize the square of the 2 norm. An analytic solution should be possible I should think. – Paul Jan 06 '17 at 10:48
  • The problem with $||x|| \geq 1$ is not convex. For $||x|| \leq 1$ you can find x by solving a linear system (write down the KKT conditions of the quadratic form). – LinAlg Jan 06 '17 at 12:08
  • @Paul I just did and indeed got an analytical solution for the components of $\tilde{x}$ which is the representation of $x$ in the eigenspace of $A^\text{T}A$. However, it's two equations with multivariate fourth-order polynomials in $\tilde{x}_1^2$ and $\tilde{x}_2^2$ (losing sign information), so it's quite hopeless. The ugliness emerges because the constraint dictates $x_3 = \pm \sqrt{1 - x_1^2 - x_2^2}$, which has to be incorporated at some point. – GDumphart Jan 06 '17 at 14:18
  • 1
    Even though this is convex, it is well known that it can has a tractable solution. This is basically the so-called trust-region subproblem. – Michael Grant Jan 06 '17 at 15:05
  • @MichaelGrant Thank you, that's very helpful. Seems like I opened an old can of worms, I am reading a thesis about it right now. Do you mean "Even though this is non-convex"? – GDumphart Jan 06 '17 at 15:37
  • 1
    It seems like my problem is covered thoroughly here: http://www.maths.tcd.ie/~mnl/store/ForsytheGolub1965a.pdf – GDumphart Jan 06 '17 at 16:08
  • Glad you found a good reference. Sorry about the previous typo! – Michael Grant Jan 06 '17 at 16:46
  • In the end, my question was 100% addressed by the following paper. Walter Gander, "Least squares with a quadratic constraint", Numerische Mathematik, 1980. – GDumphart Apr 03 '24 at 21:44

1 Answers1

1

If you relax your constraint to $ {x}^{T} x \leq 1 $ you can have easy solution.
I solved it for Least Squares, but it will be the same for you (Just adjust the matrices / vectors accordingly).

If you look into the solution you can infer how to deal with cases the solution has norm less than 1.

The solution below uses inversion of matrix.
But you could easily replace this step in any iterative solver of linear equation.

$$ \begin{alignat*}{3} \text{minimize} & \quad & \frac{1}{2} \left\| A x - b \right\|_{2}^{2} \\ \text{subject to} & \quad & {x}^{T} x \leq 1 \end{alignat*} $$

The Lagrangian is given by:

$$ L \left( x, \lambda \right) = \frac{1}{2} \left\| A x - b \right\|_{2}^{2} + \lambda \left( {x}^{T} x - 1 \right) $$

The KKT Conditions are given by:

$$ \begin{align*} \nabla L \left( x, \lambda \right) = {A}^{T} \left( A x - b \right) + 2 \lambda x & = 0 && \text{(1) Stationary Point} \\ \lambda \left( {x}^{T} x - 1 \right) & = 0 && \text{(2) Slackness} \\ {x}^{T} x & \leq 1 && \text{(3) Primal Feasibility} \\ \lambda & \geq 0 && \text{(4) Dual Feasibility} \end{align*} $$

From (1) one could see that the optimal solution is given by:

$$ \hat{x} = {\left( {A}^{T} A + 2 \lambda I \right)}^{-1} {A}^{T} b $$

Which is basically the solution for Tikhonov Regularization of the Least Squares problem.

Now, from (2) if $ \lambda = 0 $ it means $ {x}^{T} x = 1 $ namely $ \left\| {\left( {A}^{T} A \right)}^{-1} {A}^{T} b \right\|_{2} = 1 $.

So one need to check the Least Squares solution first.
If $ \left\| {\left( {A}^{T} A \right)}^{-1} {A}^{T} b \right\|_{2} \leq 1 $ then $ \hat{x} = {\left( {A}^{T} A \right)}^{-1} {A}^{T} b $.

Otherwise, one need to find the optimal $ \hat{\lambda} $ such that $ \left\| {\left( {A}^{T} A + 2 \lambda I \right)}^{-1} {A}^{T} b \right\| = 1 $.

For $ \lambda \geq 0 $ the function:

$$ f \left( \lambda \right) = \left\| {\left( {A}^{T} A + 2 \lambda I \right)}^{-1} {A}^{T} b \right\| $$

Is monotonically descending and bounded below by $ 0 $.

enter image description here

Hence, all needed is to find the optimal value by any method by starting at $ 0 $.

Basically the methods is solving iteratively Tikhonov Regularized Least Squares problem.

Remark: The above was taken from my answer to Solution for $ \arg \min_{ {x}^{T} x = 1} { x}^{T} A x - {c}^{T} x $ - Quadratic Function with Non Linear Equality Constraint.

Solve the Path of Ridge Regression Efficiently

The specific case of regularization above is basically Ridge Regression.
The objective is to calculate $ {\left( {A}^{T} A + 2 \lambda I \right)}^{-1} {A}^{T} b $ efficiently.

In order to do so, one can use the Singular Value Decomposition (SVD) of $ A = U \Sigma {V}^{T} $ which implies:

$$ A = U \Sigma {V}^{T} \implies {A}^{T} A + 2 \lambda I = V {\Sigma}^{2} {V}^{T} + 2 \lambda V {V}^{T} \implies {\left( {A}^{T} A + 2 \lambda I \right)}^{-1} {A}^{T} b = V D {U}^{T} \boldsymbol{b} $$

Where $ {D}_{ii} = \frac{ {\sigma}_{i} }{ {\sigma}_{i}^{2} + 2 \lambda } $

The above can be written as a recipe:

  1. Calculate the SVD of $ A $.
  2. Calculate $ c = {U}^{T} b $.
  3. Generate a single variable function $ f \left( \lambda \right) = V D \left( \lambda \right) c $.
  4. Use a single variable optimizer to find the $ \lambda $ which yields $ \left\| f \left( \lambda \right) \right\| = 1 $.

There are 2 cases:

  1. $ \left\| f \left( 0 \right) \right\| < 1 $: Then $ \lambda $ should be negative. So the optimizer search should be $ \left( -\frac{{\sigma}_{end}^2}{2}, 0 \right) $.
  2. $ \left\| f \left( 0 \right) \right\| > 1 $: Then $ \lambda $ should be positive. So the optimizer search should be $ \left( 0, \infty \right) $.

Remark: If $ A \in \mathbb{R}^{m \times n} $ the above will work perfectly for the overdetermined case ($ m > n $, Like in this question). For the underdetermined case it will work for the cases $ \left\| f \left( 0 \right) \right\| > 1 $. When $ m < n $ and $ \left\| f \left( 0 \right) \right\| < 1 $ then it will converge to a local minimum, but it might not be the least squares solution.

Royi
  • 10,050