Feature Selection before modeling with Boosting Trees

Asked Oct 28 '20 at 19:28

Active Mar 26 '21 at 15:16

Viewed 201 times

I have read in some papers that the subset of features chosen for a boosting tree algorithm will make a big difference on the performance
so I've been trying RFE, Boruta, Clustering variables, correlation, WOE & IV and Chi-square

Let's say I have a classification problem with over 40 variables, best results after a long long time testing :

all variables for Lightgbm (except of one variable with high linearity)
I removed correlated variables for Xgboost (around 8 correlated ones)
I removed variables based on ElasticNet model for Catboost (around 7 ones)

My question is : what's the proper way to choose the candidates variables for modeling a boosting tree (especially for Lightgbm) ?

I'm using R if there is any suggestion for packages ?

edited Mar 26 '21 at 15:16

Brian Spiering

23,131
2
29
113

asked Oct 28 '20 at 19:28

Mamoud

Feature Selection before modeling with Boosting Trees

0 Answers0