Highest Voted 'decision-trees' Questions - Data Science Stack Exchange

113

votes

9 answers

When should I use Gini Impurity as opposed to Information Gain (Entropy)?

Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)? Which metric is better to use in different scenarios while using decision trees?

asked Feb 12 '16 at 22:05

Krish Mahajan

1,231
2
9
4

88

votes

6 answers

strings as features in decision tree/random forest

I am doing some problems on an application of decision tree/random forest. I am trying to fit a problem which has numbers as well as strings (such as country name) as features. Now the library, scikit-learn takes only numbers as parameters, but I…

machine-learning python scikit-learn random-forest decision-trees

asked Feb 25 '15 at 01:07

user3001408

1,015
1
10
8

41

votes

4 answers

Why do we need XGBoost and Random Forest?

I wasn't clear on couple of concepts: XGBoost converts weak learners to strong learners. What's the advantage of doing this ? Combining many weak learners instead of just using a single tree ? Random Forest uses various sample from tree to create…

machine-learning data-mining random-forest decision-trees xgboost

asked Oct 14 '17 at 12:33

John Constantine

707
2
8
10

40

votes

5 answers

Are decision tree algorithms linear or nonlinear

Recently a friend of mine was asked whether decision tree algorithms are linear or nonlinear algorithms in an interview. I tried to look for answers to this question but couldn't find any satisfactory explanation. Can anyone answer and explain the…

machine-learning classification decision-trees algorithms pac-learning

asked Aug 13 '15 at 13:59

user2966197

541
1
6
8

38

votes

3 answers

Is it necessary to normalize data for XGBoost?

MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…

decision-trees xgboost normalization

asked Sep 28 '19 at 13:35

user781486

1,455
2
17
20

33

votes

1 answer

How is a splitting point chosen for continuous variables in decision trees?

I have two questions related to decision trees: If we have a continuous attribute, how do we choose the splitting value? Example: Age=(20,29,50,40....) Imagine that we have a continuous attribute $f$ that have values in $R$. How can I write an…

classification data decision-trees

asked Nov 03 '17 at 21:45

WALID BELRHALMIA

441
1
4
5

26

votes

4 answers

How to predict probabilities in xgboost using R?

The below predict function is giving -ve values as well so it cannot be probabilities. param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9) bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000) pred_s <-…

machine-learning r predictive-modeling decision-trees xgboost

asked Sep 08 '15 at 03:14

GeorgeOfTheRF

2,078
5
18
20

26

votes

4 answers

How to make a decision tree with both continuous and categorical variables in the dataset?

Let's say I have 3 categorical and 2 continuous attributes in a dataset. How do I build a decision tree using these 5 variables? Edit: For categorical variables, it is easy to say that we will split them just by {yes/no} and calculate the total gini…

machine-learning decision-trees

asked Jun 04 '18 at 18:23

Sahil Chaturvedi

435
1
4
7

24

votes

1 answer

Decision trees: leaf-wise (best-first) and level-wise tree traverse

Issue 1: I am confused by the description of LightGBM regarding the way the tree is expanded. They state: Most decision tree learning algorithms grow tree by level (depth)-wise, like the following image: Questions 1: Which "most" algorithms…

decision-trees xgboost

asked Jan 16 '18 at 17:04

kkk

463
1
4
12

22

votes

1 answer

XGBRegressor vs. xgboost.train huge speed difference?

If I train my model using the following code: import xgboost as xg params = {'max_depth':3, 'min_child_weight':10, 'learning_rate':0.3, 'subsample':0.5, 'colsample_bytree':0.6, 'obj':'reg:linear', 'n_estimators':1000, 'eta':0.3} features =…

machine-learning python decision-trees xgboost efficiency

asked Mar 01 '17 at 19:15

user1566200

315
1
3
8

18

votes

5 answers

Should I use a decision tree or logistic regression for classification?

I am working on a classification problem. I have a dataset containing equal numbers of categorical variables and continuous variables. How do I decide which technique to use, between a decision tree and logistic regression? Is it right to assume…

classification logistic-regression decision-trees

asked Jun 09 '15 at 09:37

Arun

727
3
10
27

17

votes

5 answers

Decision tree vs. KNN

In which cases is it better to use a Decision tree and other cases a KNN? Why use one of them in certain cases? And the other in different cases? (By looking at its functionality, not at the algorithm) Anyone have some explanations or references…

machine-learning data-mining decision-trees

asked Dec 05 '15 at 22:24

gchavez1

173
1
1
4

17

votes

2 answers

When to choose linear regression or Decision Tree or Random Forest regression?

I am working on a project and I am having difficulty in deciding which algorithm to choose for regression. I want to know under what conditions should one choose a linear regression or Decision Tree regression or Random Forest regression? Are there…

machine-learning algorithms random-forest linear-regression decision-trees

asked Dec 02 '15 at 01:06

Jason Donnald

331
2
4
6

15

votes

1 answer

Can gradient boosted trees fit any function?

For neural networks we have the universal approximation theorem which states that neural networks can approximate any continuous function on a compact subset of $R^n$. Is there a similar result for gradient boosted trees? It seems reasonable since…

decision-trees

asked Jun 07 '18 at 16:30

Imran

2,381
13
22

13

votes

3 answers

Unbalanced classes -- How to minimize false negatives?

I have a dataset that has a binary class attribute. There are 623 instances with class +1 (cancer positive) and 101,671 instances with class -1 (cancer negative). I've tried various algorithms (Naive Bayes, Random Forest, AODE, C4.5) and all of them…

classification random-forest decision-trees class-imbalance

asked Nov 12 '15 at 16:09

user798275

293
2
3
5

Questions tagged [decision-trees]