Conditions for two optimization problems to yield the same solution

Question

Problem: Consider the optimization problems $$\min_\beta \|y-X\beta\|^2+\alpha\|\beta\|^2 \tag 1$$ and $$\min_\beta \|\beta\|^2 \text{ subject to } \|y-X\beta\|^2 \le c \tag 2$$ where $\|x\|$ is the $2$-norm. Fix $\alpha$, and suppose $\beta^*$ is the solution to ($1$), and let $c=\|y-X\beta^*\|^2$. Is it true that the solution to ($2$) is also $\beta^*$?

Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?

AugSB · Accepted Answer · 2018-12-27T08:35:19.800

3

It is true for $\alpha>0$. Since $\beta^*$ is solution of (1), we have: $$\|y-X\beta^*\|^2 + \alpha\|\beta^*\|^2 \le \|y-X\beta\|^2 + \alpha\|\beta\|^2.$$ Reordering: $$\|\beta^*\|^2 \le \frac{1}{\alpha}(\|y-X\beta\|^2 - \|y-X\beta^*\|^2) + \|\beta\|^2.$$ Now, in (2) we take $\beta$ such that $\|y-X\beta\|^2 \le \|y-X\beta^*\|^2$, so we conclude that $$\|\beta^*\|^2 \le \|\beta\|^2,$$ which implies that $\beta^*$ is a minimum of (2).

edited Dec 27 '18 at 08:35

answered Dec 26 '18 at 00:18

AugSB

5,107

I'm not convinced of your final statement. For one thing setting $\alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer. – Michael Grant Dec 26 '18 at 22:24
@MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around. – AugSB Dec 27 '18 at 08:39
I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case. – Michael Grant Dec 27 '18 at 13:38
Please, correct me if I'm wrong, because maybe I'm missing something. For any $\alpha<0$, $\beta^$ being solution of (1) implies $|y-X\beta^|^2 + \alpha|\beta^|^2 \le |y-X\beta|^2 + \alpha|\beta|^2$ for all $\beta$. Therefore, $\alpha(|\beta^|^2 - |\beta|^2) \le |y-X\beta|^2 - |y-X\beta^|^2$ for all $\beta$. Assuming that $|y-X\beta|^2 \le |y-X\beta^|^2$, we conclude that $|\beta^|^2 \ge |\beta|^2$ for all $\beta$. So $\beta^ = \max_\beta |\beta|^2$ subject to $|y-X\beta|^2 \le |y-X\beta^*|^2$. – AugSB Dec 27 '18 at 18:31
"Assuming that $|y-X\beta|^2\leq |y-X\beta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria. – Michael Grant Jan 07 '19 at 16:51
Cannot we assume that from "subject to $|y−X\beta|^2 ≤ c$", knowing that $c=|y−X\beta^*|^2$ (which is given by OP)? – AugSB Jan 07 '19 at 18:30
Aha, that's the missing piece I wasn't seeing. Thanks for your patience! – Michael Grant Jan 08 '19 at 03:47
@MichaelGrant You are welcome. You made a good observation, though. Non-equivalence needs to be mentioned, in order to avoid misunderstandings. – AugSB Jan 09 '19 at 09:53

score 0 · Answer 2 · answered Dec 28 '18 at 11:55

Here from (1)

$$ f(\beta) = y'\cdot y-2\beta'\cdot X'y+\beta'\cdot X'\cdot X\cdot\beta+\alpha \beta'\cdot\beta $$

so the minimum condition gives

$$ -X'\cdot y+X'\cdot X\beta+\alpha\beta = 0 $$

and then

$$ \beta^* = (I\alpha +X'\cdot X)^{-1}X'\cdot y $$

and from (2)

$$ L(\beta,\lambda,\epsilon)=\beta'\cdot\beta + \lambda(y'\cdot y-2\beta'\cdot X'y+\beta'\cdot X'\cdot X\cdot\beta-c+\epsilon^2) $$

the stationary points are

$$ L_{\beta} = 2\beta-2\lambda X'\cdot y + 2\lambda X'\cdot X\cdot\beta = 0 $$

then

$$ (I+\lambda X'\cdot X)\cdot\beta^* = \lambda X'\cdot y $$

or

$$ \beta^* = (I\frac{1}{\lambda}+X'\cdot X)^{-1}\cdot X'\cdot y $$

but

$$ \beta'^*\cdot\beta^*-\lambda\beta'^*\cdot X'\cdot y+\lambda \beta'^*\cdot X'\cdot X\cdot\beta^* = 0 $$

and

$$ \lambda = \frac{\beta'\cdot\beta}{y'\cdot y-\beta'\cdot X'\cdot y-c+\epsilon^2} $$

so the equivalence between (1) and (2) needs

$$ \lambda = \frac{1}{\alpha} = \frac{\beta'^*\cdot\beta^*}{y'\cdot y-\beta'^*\cdot X'\cdot y-c+\epsilon^2} $$

or

$$ \alpha = \frac{(y-X\cdot \beta^*)'\cdot y-c+\epsilon}{\beta'^*\cdot\beta^*} $$

which is quite unlikely

Conditions for two optimization problems to yield the same solution

2 Answers2