5

Problem: Consider the optimization problems $$\min_\beta \|y-X\beta\|^2+\alpha\|\beta\|^2 \tag 1$$ and $$\min_\beta \|\beta\|^2 \text{ subject to } \|y-X\beta\|^2 \le c \tag 2$$ where $\|x\|$ is the $2$-norm. Fix $\alpha$, and suppose $\beta^*$ is the solution to ($1$), and let $c=\|y-X\beta^*\|^2$. Is it true that the solution to ($2$) is also $\beta^*$?

Attempt: I believe this is true. The argument should be very similar to the one in Why are additional constraint and penalty term equivalent in ridge regression?. However, I was running some numerical experiments and it turns out the two problems have different solutions. Hence my question here: are the two problems really yielding the same solutions? Are there exceptions that I should be careful of?

Rócherz
  • 4,241
Longti
  • 435
  • 4
  • 18

2 Answers2

3

It is true for $\alpha>0$. Since $\beta^*$ is solution of (1), we have: $$\|y-X\beta^*\|^2 + \alpha\|\beta^*\|^2 \le \|y-X\beta\|^2 + \alpha\|\beta\|^2.$$ Reordering: $$\|\beta^*\|^2 \le \frac{1}{\alpha}(\|y-X\beta\|^2 - \|y-X\beta^*\|^2) + \|\beta\|^2.$$ Now, in (2) we take $\beta$ such that $\|y-X\beta\|^2 \le \|y-X\beta^*\|^2$, so we conclude that $$\|\beta^*\|^2 \le \|\beta\|^2,$$ which implies that $\beta^*$ is a minimum of (2).

Analogously, for $\alpha<0$ you can check that $\beta^*$ solution of (1) is also solution of (2), BUT maximizing instead of minimizing: $$\alpha(\|\beta^*\|^2 - \|\beta\|^2) \le \|y-X\beta\|^2 - \|y-X\beta^*\|^2 \le 0 \quad\Longrightarrow\quad \|\beta^*\|^2 \ge \|\beta\|^2.$$ Anyhow, note that you cannot assure equivalence, since the problem becomes non-convex for $\alpha<0$.

AugSB
  • 5,107
  • I'm not convinced of your final statement. For one thing setting $\alpha<0$ may make this problem non-convex. But I see no reason to believe it would give equivalent solutions if I were "maximizing instead of minimizing". After all, you're not negating the entire objective, just the regularizer. – Michael Grant Dec 26 '18 at 22:24
  • @MichaelGrant I did not say it is equivalent ;) Anyway, I have added a note to make clear that solution of (1) would be solution of (2) (maximizing instead of minimizing), but not the other way around. – AugSB Dec 27 '18 at 08:39
  • I'm afraid I still don't see it for the non-convex case at all. Fortunately I believe the OP is likely concerning himself with the convex case. – Michael Grant Dec 27 '18 at 13:38
  • Please, correct me if I'm wrong, because maybe I'm missing something. For any $\alpha<0$, $\beta^$ being solution of (1) implies $|y-X\beta^|^2 + \alpha|\beta^|^2 \le |y-X\beta|^2 + \alpha|\beta|^2$ for all $\beta$. Therefore, $\alpha(|\beta^|^2 - |\beta|^2) \le |y-X\beta|^2 - |y-X\beta^|^2$ for all $\beta$. Assuming that $|y-X\beta|^2 \le |y-X\beta^|^2$, we conclude that $|\beta^|^2 \ge |\beta|^2$ for all $\beta$. So $\beta^ = \max_\beta |\beta|^2$ subject to $|y-X\beta|^2 \le |y-X\beta^*|^2$. – AugSB Dec 27 '18 at 18:31
  • "Assuming that $|y-X\beta|^2\leq |y-X\beta^*|$" but that's just it, you can't make that assumption. The point of adding a regularizer like $|beta|^2$ is that you're willing to trade on the optimal value of error in exchange for other criteria. – Michael Grant Jan 07 '19 at 16:51
  • Cannot we assume that from "subject to $|y−X\beta|^2 ≤ c$", knowing that $c=|y−X\beta^*|^2$ (which is given by OP)? – AugSB Jan 07 '19 at 18:30
  • Aha, that's the missing piece I wasn't seeing. Thanks for your patience! – Michael Grant Jan 08 '19 at 03:47
  • @MichaelGrant You are welcome. You made a good observation, though. Non-equivalence needs to be mentioned, in order to avoid misunderstandings. – AugSB Jan 09 '19 at 09:53
0

Here from (1)

$$ f(\beta) = y'\cdot y-2\beta'\cdot X'y+\beta'\cdot X'\cdot X\cdot\beta+\alpha \beta'\cdot\beta $$

so the minimum condition gives

$$ -X'\cdot y+X'\cdot X\beta+\alpha\beta = 0 $$

and then

$$ \beta^* = (I\alpha +X'\cdot X)^{-1}X'\cdot y $$

and from (2)

$$ L(\beta,\lambda,\epsilon)=\beta'\cdot\beta + \lambda(y'\cdot y-2\beta'\cdot X'y+\beta'\cdot X'\cdot X\cdot\beta-c+\epsilon^2) $$

the stationary points are

$$ L_{\beta} = 2\beta-2\lambda X'\cdot y + 2\lambda X'\cdot X\cdot\beta = 0 $$

then

$$ (I+\lambda X'\cdot X)\cdot\beta^* = \lambda X'\cdot y $$

or

$$ \beta^* = (I\frac{1}{\lambda}+X'\cdot X)^{-1}\cdot X'\cdot y $$

but

$$ \beta'^*\cdot\beta^*-\lambda\beta'^*\cdot X'\cdot y+\lambda \beta'^*\cdot X'\cdot X\cdot\beta^* = 0 $$

and

$$ \lambda = \frac{\beta'\cdot\beta}{y'\cdot y-\beta'\cdot X'\cdot y-c+\epsilon^2} $$

so the equivalence between (1) and (2) needs

$$ \lambda = \frac{1}{\alpha} = \frac{\beta'^*\cdot\beta^*}{y'\cdot y-\beta'^*\cdot X'\cdot y-c+\epsilon^2} $$

or

$$ \alpha = \frac{(y-X\cdot \beta^*)'\cdot y-c+\epsilon}{\beta'^*\cdot\beta^*} $$

which is quite unlikely

Cesareo
  • 36,341