Are greedy methods such as orthogonal matching pursuit considered obsolete for finding sparse solutions?

Question

When researchers first began seeking sparse solutions to $Ax = b$, they used greedy methods such as orthogonal matching pursuit (OMP). In OMP, we activate components of $x$ one by one, and at each stage we select the component $i$ such that the $i$th column of $A$ is most correlated with the residual $Ax - b$.

Researchers then developed methods such as Basis Pursuit and Lasso, which are based on solving optimization problems with sparsity-inducing regularizers. The Basis Pursuit problem is \begin{align} \underset{x}{\text{minimize}} & \quad \| x \|_1 \\ \text{subject to} & \quad Ax = b. \end{align} The Lasso problem is $$ \underset{x}{\text{minimize}} \quad \frac12 \| Ax - b \|_2^2 + \lambda \| x \|_1. $$ This new strategy was made possible by new optimization algorithms (interior point methods) which were able to solve these large scale optimization problems efficiently.

Question: Are greedy methods such as orthogonal matching pursuit and its variants now considered to be obsolete? Is there a consensus that they do not work as well as the approaches based on optimization with sparsity-inducing regularizers? Has OMP been abandoned?

Here is a 1994 paper by Chen and Donoho that gives a brief overview of early attempts to find sparse solutions to $Ax = b$, leading up to Basis Pursuit and Lasso: Atomic Decomposition by Basis Pursuit

for me greedy methods are still usefully if I am greedy (if i do not want to select many variables, there can be many reasons). — , Jul 23 '17 at 21:17
@MartijnWeterings But if you do not want to select too many variables, you can achieve this using Basis Pursuit or Lasso, and in fact I believe you will get better results. Note that the paper I linked to makes the following statement about Matching Pursuit: "One can give examples of dictionaries and signals where the method gives a solution which is badly sub-optimal in terms of sparsity." — littleO, Jul 23 '17 at 21:21
$o$, the exhaustibe search method mentioned after the matching pursuit does provide a global optimum solution to the same matching conditions (OLS and sparsity). The basis pursuit and lasso, optimises a different objective. I do not think it should be considered providing generally better results. — Sextus Empiricus, Jul 24 '17 at 10:23
But what reason is there to expect the greedy methods to be preferable in the case where you do not want to select many variables? By the way, the exhaustive search method is not practical, because there are a vast number of subproblems to be solved, so I don't consider it to be one of the greedy methods under consideration in this question. Perhaps you meant that if you want a very small number of variables to be active, like three or four, then exhaustive search may be tractable. — littleO, Jul 29 '17 at 20:26

score 10 · Accepted Answer · answered Jul 24 '17 at 16:34

If by ''obsolete'' you mean ''researchers are not working on them anymore'', the answer is no - or, at least, I shall hope not! Here are few papers that have been published among the iterative / greedy approaches, that are used to analyze various methods:

for some new results on orthogonal matching pursuit 2015
Some results on the efficacy of Graded Hard Thresholding Pursuit (yes, this is shameless advertisement) 2016
IHT for subexponential measurements 2017
Some results on hard thresholding 2016

This list is clearly not exhaustive, but represents 4 main papers (in my opinion) that treat of iterative and greedy approaches for sparse approximation. Should be included: all resources related to co-sparse model, dictionary-sparse approaches, one bit approaches & co.

If by ''obsolete'' you mean ''has not practical applications'', well allow me to retort! I have personally used such greedy approaches in the following contexts:

Approximation of numerical solutions to high dimensional PDEs via sparse polynomial chaos expansions
Analysis of metagenomics data survey by greedy methods and information theory (this is quite old, needs to be refreshed, and is still pretty much work in progress)

Bottom line, yes! These methods are alive, and are being researched from both a theoretical and a practical point of view!

Thanks. What is your opinion or impression about when the greedy approaches are more useful than approaches like Basis Pursuit or Lasso? Is there a rule of thumb about when one method is preferred over the other? For example, perhaps the greedy methods are preferred in cases where the Lasso problem is too huge to be solved efficiently. Are the greedy methods useful only for computational efficiency, or are there cases where they simply yield better results than Lasso (disregarding computational efficiency)? — littleO, Jul 24 '17 at 16:41
This goes beyond my knowledge. To the question, are greedy methods obsolete? My answer is no.
To the question Are they better than optimization based methods? The numerics show that using (G)HTP or OMP on the parametric PDE problems is orders of magnitudes faster than using CVX -- knowing that CVX is not known to be efficient.

To the question "When is one better than the other?" I haven't got a clue. My feeling is, as long as the number of active atoms is reasonably small, I expect iterative / greedy approaches to perform better. Whatever "small" means in this context. — Jean-Luc Bouchot, Jul 24 '17 at 16:48
Thanks for your input, it's very helpful! These paper references are also really helpful. — littleO, Jul 24 '17 at 16:57

Sextus Empiricus · Answer 2 · 2017-08-01T12:34:19.187

Below is an example that illustrates a difference between matching pursuit and lasso and demonstrates that they have a different order in which the variables are selected into the active sets.

I believe that this is relevant in certain fields that wish to minimize the size of the active set. For instance

When developing a model that relates skin-fold measurements with body/fat composition measurements, one may wish to develop a model that reduces the need for the number of skin-fold measurements.
Or when trying to find a mixture that approximates some original sample, and there is a certain limit to the number of ingredients in the mixture.

I realize that these cases are a bit different than the case in the cited article of the original question, which is about finding in a combination of functions (the question was also initially placed in stats.stackexchange.com and the cases that I provide are more "statistic"). Nevertheless, they may be those cases that place most pressure on the requirement of a small active set and relate to many other practical examples in which matching pursuit or variants (in the example stepAIC) work well (and possibly better, if not in most cases then at least in some cases).

The example:

Requirements are R with the 'lars' package. The data set used for the example are biomedical data related to diabetes.

setup:

> # setting up libraries and data
> library(lars)
> data(diabetes)
> lmdata <- as.data.frame(cbind(diabetes$y, diabetes$x))

matching pursuit:

which will activate variables in the order bmi, ltg, map, tc, sex, ldl

> # matching pursuit
> # (or actually stepAIC but in this case it does do the stepwise addition of the most correlating variable)
> base_model <- lm(V1 ~ 1, data = lmdata)
> stepAIC(base_model, scope = paste0(c("~ 1", colnames(diabetes$x)), collapse=" + "), trace = 0)[13]
$anova
Stepwise Model Path
Analysis of Deviance Table

Initial Model:
  V1 ~ 1

Final Model:
  V1 ~ bmi + ltg + map + tc + sex + ldl


Step Df  Deviance Resid. Df Resid. Dev      AIC
1                          441    2621009 3841.990
2 + bmi  1 901427.31       440    1719582 3657.697
3 + ltg  1 302887.70       439    1416694 3574.057
4 + map  1  53986.43       438    1362708 3558.884
5  + tc  1  31277.49       437    1331430 3550.621
6 + sex  1  20561.32       436    1310869 3545.742
7 + ldl  1  39377.57       435    1271491 3534.261

lasso:

which will activate variables in the order bmi, ltg, map, hdl, sex, glu...

> # lasso
> lars(diabetes$x, diabetes$y, type="lasso")

Call:
  lars(x = diabetes$x, y = diabetes$y, type = "lasso")
R-squared: 0.518
Sequence of LASSO moves:
  bmi ltg map hdl sex glu tc tch ldl age hdl hdl
Var    3   9   4   7   2  10  5   8   6   1  -7   7
Step   1   2   3   4   5   6  7   8   9  10  11  12

comparison:

> #comparison of R^2 for a model with 6 variables
> #
> #matching pursuit
> summary(lm(V1 ~ bmi + ltg + map + tc + sex + ldl,  data = lmdata))$r.squared
[1] 0.5148848
> #lasso
> summary(lm(V1 ~ bmi + ltg + map + hdl + sex + glu, data = lmdata))$r.squared
[1] 0.5094151

The example clearly shows that lasso has a different order in adding the variables.

This difference may be related to the different priorities and lasso prefers to add variables if this can reduce the sum of the coefficients.

In certain practical cases this behavior of lasso may not be necessary, and in some cases it may even be detrimental.

Are greedy methods such as orthogonal matching pursuit considered obsolete for finding sparse solutions?

2 Answers2

The example: