A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm.
Questions tagged [decision-trees]
746 questions
113
votes
9 answers
When should I use Gini Impurity as opposed to Information Gain (Entropy)?
Can someone practically explain the rationale behind Gini impurity vs Information gain (based on Entropy)?
Which metric is better to use in different scenarios while using decision trees?
Krish Mahajan
- 1,231
- 2
- 9
- 4
88
votes
6 answers
strings as features in decision tree/random forest
I am doing some problems on an application of decision tree/random forest. I am trying to fit a problem which has numbers as well as strings (such as country name) as features. Now the library, scikit-learn takes only numbers as parameters, but I…
user3001408
- 1,015
- 1
- 10
- 8
41
votes
4 answers
Why do we need XGBoost and Random Forest?
I wasn't clear on couple of concepts:
XGBoost converts weak learners to strong learners. What's the advantage of doing this ? Combining many weak learners instead of just using a single tree ?
Random Forest uses various sample from tree to create…
John Constantine
- 707
- 2
- 8
- 10
40
votes
5 answers
Are decision tree algorithms linear or nonlinear
Recently a friend of mine was asked whether decision tree algorithms are linear or nonlinear algorithms in an interview. I tried to look for answers to this question but couldn't find any satisfactory explanation. Can anyone answer and explain the…
user2966197
- 541
- 1
- 6
- 8
38
votes
3 answers
Is it necessary to normalize data for XGBoost?
MinMaxScaler() in scikit-learn is used for data normalization (a.k.a feature scaling). Data normalization is not necessary for decision trees. Since XGBoost is based on decision trees, is it necessary to do data normalization using MinMaxScaler()…
user781486
- 1,455
- 2
- 17
- 20
33
votes
1 answer
How is a splitting point chosen for continuous variables in decision trees?
I have two questions related to decision trees:
If we have a continuous attribute, how do we choose the splitting value?
Example: Age=(20,29,50,40....)
Imagine that we have a continuous attribute $f$ that have values in $R$. How can I write an…
WALID BELRHALMIA
- 441
- 1
- 4
- 5
26
votes
4 answers
How to predict probabilities in xgboost using R?
The below predict function is giving -ve values as well so it cannot be probabilities.
param <- list(max.depth = 5, eta = 0.01, objective="binary:logistic",subsample=0.9)
bst <- xgboost(param, data = x_mat, label = y_mat,nround = 3000)
pred_s <-…
GeorgeOfTheRF
- 2,078
- 5
- 18
- 20
26
votes
4 answers
How to make a decision tree with both continuous and categorical variables in the dataset?
Let's say I have 3 categorical and 2 continuous attributes in a dataset. How do I build a decision tree using these 5 variables?
Edit:
For categorical variables, it is easy to say that we will split them just by {yes/no} and calculate the total gini…
Sahil Chaturvedi
- 435
- 1
- 4
- 7
24
votes
1 answer
Decision trees: leaf-wise (best-first) and level-wise tree traverse
Issue 1:
I am confused by the description of LightGBM regarding the way the tree is expanded.
They state:
Most decision tree learning algorithms grow tree by level
(depth)-wise, like the following image:
Questions 1: Which "most" algorithms…
kkk
- 463
- 1
- 4
- 12
22
votes
1 answer
XGBRegressor vs. xgboost.train huge speed difference?
If I train my model using the following code:
import xgboost as xg
params = {'max_depth':3,
'min_child_weight':10,
'learning_rate':0.3,
'subsample':0.5,
'colsample_bytree':0.6,
'obj':'reg:linear',
'n_estimators':1000,
'eta':0.3}
features =…
user1566200
- 315
- 1
- 3
- 8
18
votes
5 answers
Should I use a decision tree or logistic regression for classification?
I am working on a classification problem. I have a dataset containing equal numbers of categorical variables and continuous variables. How do I decide which technique to use, between a decision tree and logistic regression?
Is it right to assume…
Arun
- 727
- 3
- 10
- 27
17
votes
5 answers
Decision tree vs. KNN
In which cases is it better to use a Decision tree and other cases a KNN?
Why use one of them in certain cases? And the other in different cases? (By looking at its functionality, not at the algorithm)
Anyone have some explanations or references…
gchavez1
- 173
- 1
- 1
- 4
17
votes
2 answers
When to choose linear regression or Decision Tree or Random Forest regression?
I am working on a project and I am having difficulty in deciding which algorithm to choose for regression. I want to know under what conditions should one choose a linear regression or Decision Tree regression or Random Forest regression? Are there…
Jason Donnald
- 331
- 2
- 4
- 6
15
votes
1 answer
Can gradient boosted trees fit any function?
For neural networks we have the universal approximation theorem which states that neural networks can approximate any continuous function on a compact subset of $R^n$.
Is there a similar result for gradient boosted trees? It seems reasonable since…
Imran
- 2,381
- 13
- 22
13
votes
3 answers
Unbalanced classes -- How to minimize false negatives?
I have a dataset that has a binary class attribute. There are 623 instances with class +1 (cancer positive) and 101,671 instances with class -1 (cancer negative).
I've tried various algorithms (Naive Bayes, Random Forest, AODE, C4.5) and all of them…
user798275
- 293
- 2
- 3
- 5