Highest Voted 'gbm' Questions - Data Science Stack Exchange

83

votes

5 answers

GBM vs XGBOOST? Key differences?

I am trying to understand the key differences between GBM and XGBOOST. I tried to google it, but could not find any good answers explaining the differences between the two algorithms and why xgboost almost always performs better than GBM. What makes…

asked Feb 11 '17 at 20:03

Aman

997
1
8
8

61

votes

6 answers

Does XGBoost handle multicollinearity by itself?

I'm currently using XGBoost on a data-set with 21 features (selected from list of some 150 features), then one-hot coded them to obtain ~98 features. A few of these 98 features are somewhat redundant, for example: a variable (feature) $A$ also…

feature-selection correlation xgboost gbm

asked Jul 02 '16 at 07:30

neural-nut

1,803
3
18
28

36

votes

1 answer

Why is xgboost so much faster than sklearn GradientBoostingClassifier?

I'm trying to train a gradient boosting model over 50k examples with 100 numeric features. XGBClassifier handles 500 trees within 43 seconds on my machine, while GradientBoostingClassifier handles only 10 trees(!) in 1 minutes and 2 seconds :( I…

scikit-learn xgboost gbm

asked Mar 29 '16 at 14:14

ihadanny

1,357
2
11
20

26

votes

2 answers

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…

search ranking xgboost gbm

asked Feb 10 '16 at 16:40

tokestermw

418
1
4
8

16

votes

4 answers

Need help understanding xgboost's approximate split points proposal

background: in xgboost the $t$ iteration tries to fit a tree $f_t$ over all $n$ examples which minimizes the following objective: $$\sum_{i=1}^n[g_if_t(x_i) + \frac{1}{2}h_if_t^2(x_i)]$$ where $g_i, h_i$ are first order and second order derivatives…

xgboost gbm

asked Apr 01 '16 at 14:35

ihadanny

1,357
2
11
20

13

votes

1 answer

AdaBoost implementation and tuning for high dimensional feature space in R

I am trying to implement the AdaBoost.M1 algorithm (trees as base-learners) to a data set with a large feature space (~ 20.000 features) and ~ 100 samples in R. There exists a variety of different packages for this purpose; AdaBag, Ada and gbm.…

machine-learning r boosting gbm adaboost

asked Aug 04 '17 at 14:25

AfBM

131
3

11

votes

3 answers

LightGBM - Why Exclusive Feature Bundling (EFB)?

I'm currently studying GBDT and started reading LightGBM's research paper. In section 4. they explain the Exclusive Feature Bundling algorithm, which aims at reducing the number of features by regrouping mutually exclusive features into bundles,…

feature-selection decision-trees xgboost machine-learning-model gbm

asked Nov 30 '18 at 14:36

Tom

113
1
5

10

votes

2 answers

Adding feature leads to worse results

I have a dataset with 20 variables and ~50K observations, I created several new features using those 20 variables. I compare the results of a GBM model (using python xgboost and light GBM) and I found that it doesn't matter what are the…

xgboost feature-engineering model-selection gbm

asked Dec 07 '17 at 06:46

Yaron

201
1
2
5

8

votes

2 answers

Why isn't dimension sampling used with gradient boosting machines (GBM)?

GBMs, like random forests, build each tree on a different sample of the dataset and hence, going by the spirit of ensemble models, produce higher accuracies. However, I have not seen GBM being used with dimension sampling at every split of the tree…

random-forest accuracy gbm ensemble-modeling

asked Nov 25 '14 at 09:40

Nitesh

1,625
1
12
22

8

votes

2 answers

What is init_score in lightGBM?

In the tutorial boosting from existing prediction in lightGBM R, there is a init_score parameter in function setinfo. I am wondering what init_score means? In the help page, it says: init_score: initial score is the base prediction lightgbm will…

gbm self-study

asked Feb 20 '17 at 22:48

WCMC

465
1
5
11

7

votes

2 answers

Decision Trees Nodes vs Leaves Definition

I am having a little trouble understanding the difference between what a "Node" of a tree and a "Leaf" of a tree. Suppose I am trying to decide the size of coffee a person may like. There are three categories: small, medium, and large based off the…

machine-learning decision-trees gbm

asked Apr 03 '18 at 19:37

RandomSam

71
1
1
2

6

votes

3 answers

How to determine if my GBM model is overfitting?

Below is a simplified example of a h2o gradient boosting machine model using R's iris dataset. The model is trained to predict sepal length. The example yields an r2 value of 0.93, which seems unrealistic. How can I assess if these are indeed…

machine-learning r accuracy overfitting gbm

asked Jul 06 '17 at 05:22

Borealis

347
2
4
16

6

votes

3 answers

Xgboost quantile regression via custom objective

I am new to GBM and xgboost, and am currently using xgboost_0.6-2 in R. The modeling runs well with the standard objective function "objective" = "reg:linear" and after reading this NIH paper I wanted to run a quantile regression using a custom…

machine-learning predictive-modeling xgboost gradient-descent gbm

asked Dec 22 '16 at 17:06

chris

61
1
2

6

votes

2 answers

Kappa near to 60% in unbalanced (1:10) data set

As mentioned before, I have a classification problem and unbalanced data set. The majority class contains 88% of all samples. I have trained a Generalized Boosted Regression model using gbm() from the gbm package in R and get the following output: …

r class-imbalance gbm

asked Sep 12 '14 at 16:26

IgorS

5,474
11
34
43

5

votes

1 answer

Residuals in a gradient boosted classification

I know that we iteratively model the residuals in case of a gradient boosted regression problem. The intuition is very well explained at kaggle. Can someone explain what are the residuals that are modeled in case of a classification scenario?

classification xgboost gbm

asked Aug 21 '17 at 18:31

Arc

151
1
3

Questions tagged [gbm]