5

In the paper Square Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming they talk about Sqrt-LASSO which is simply just trying to minimize $\|Ax-b\|_2 + \lambda\|x\|_1$ rather than the regular LASSO $\|Ax-b\|_2^2 + \lambda\|x\|_1$.

Can anyone point out the theoretical differences between the two in terms of whether one is more robust to outliers, do we still have sparsity, etc? What about in practice, do these implementations have much of a difference?

Royi
  • 10,050

1 Answers1

2

The sparsity property of the LASSO is a result of the ${L}_{1}$ regularization.

For the case $\lambda = 0$ both variations are equal in their minimizing argument.
So basically, they scale the regularization parameter $\lambda$ differently.
So the same value of $\lambda$ will have larger effect (Tendency to sparsity) in the non squared case.

The paper is mainly around the result where the gradient of the non squared case makes it insensitive to the noise level.
Yet the analysis is done under the assumption of a specific choice of $\lambda$ for each case.
In practice, when one is doing Cross Validation for optimizing $\lambda$, the significance of the analysis is slim.

Yet, in practice, it is much easier to solve the squared case hence for any practical reason, I wouldn't use the non squared version.

Remark: There is a great analysis in The Equivalence of L2 and Squared L2 Norm Regularization in LS Regression for the analysis of the effect of squaring the regularization term.

Royi
  • 10,050