Below is an example that illustrates a difference between matching pursuit and lasso and demonstrates that they have a different order in which the variables are selected into the active sets.
I believe that this is relevant in certain fields that wish to minimize the size of the active set. For instance
- When developing a model that relates skin-fold measurements with body/fat composition measurements, one may wish to develop a model that reduces the need for the number of skin-fold measurements.
- Or when trying to find a mixture that approximates some original
sample, and there is a certain limit to the number of ingredients in the mixture.
I realize that these cases are a bit different than the case in the cited article of the original question, which is about finding in a combination of functions (the question was also initially placed in stats.stackexchange.com and the cases that I provide are more "statistic"). Nevertheless, they may be those cases that place most pressure on the requirement of a small active set and relate to many other practical examples in which matching pursuit or variants (in the example stepAIC) work well (and possibly better, if not in most cases then at least in some cases).
The example:
Requirements are R with the 'lars' package. The data set used for the example are biomedical data related to diabetes.
setup:
> # setting up libraries and data
> library(lars)
> data(diabetes)
> lmdata <- as.data.frame(cbind(diabetes$y, diabetes$x))
matching pursuit:
which will activate variables in the order bmi, ltg, map, tc, sex, ldl
> # matching pursuit
> # (or actually stepAIC but in this case it does do the stepwise addition of the most correlating variable)
> base_model <- lm(V1 ~ 1, data = lmdata)
> stepAIC(base_model, scope = paste0(c("~ 1", colnames(diabetes$x)), collapse=" + "), trace = 0)[13]
$anova
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
V1 ~ 1
Final Model:
V1 ~ bmi + ltg + map + tc + sex + ldl
Step Df Deviance Resid. Df Resid. Dev AIC
1 441 2621009 3841.990
2 + bmi 1 901427.31 440 1719582 3657.697
3 + ltg 1 302887.70 439 1416694 3574.057
4 + map 1 53986.43 438 1362708 3558.884
5 + tc 1 31277.49 437 1331430 3550.621
6 + sex 1 20561.32 436 1310869 3545.742
7 + ldl 1 39377.57 435 1271491 3534.261
lasso:
which will activate variables in the order bmi, ltg, map, hdl, sex, glu...
> # lasso
> lars(diabetes$x, diabetes$y, type="lasso")
Call:
lars(x = diabetes$x, y = diabetes$y, type = "lasso")
R-squared: 0.518
Sequence of LASSO moves:
bmi ltg map hdl sex glu tc tch ldl age hdl hdl
Var 3 9 4 7 2 10 5 8 6 1 -7 7
Step 1 2 3 4 5 6 7 8 9 10 11 12
comparison:
> #comparison of R^2 for a model with 6 variables
> #
> #matching pursuit
> summary(lm(V1 ~ bmi + ltg + map + tc + sex + ldl, data = lmdata))$r.squared
[1] 0.5148848
> #lasso
> summary(lm(V1 ~ bmi + ltg + map + hdl + sex + glu, data = lmdata))$r.squared
[1] 0.5094151
The example clearly shows that lasso has a different order in adding the variables.
This difference may be related to the different priorities and lasso prefers to add variables if this can reduce the sum of the coefficients.
In certain practical cases this behavior of lasso may not be necessary, and in some cases it may even be detrimental.