Questions tagged [catboost]

34 questions
24
votes
1 answer

Lightgbm vs xgboost vs catboost

I've seen that in Kaggle competitions people are using lightgbms where they used to use xgboost. My question is: when would you rather use xgboost instead of lightgbm? What about catboost?
David Masip
  • 6,136
  • 2
  • 28
  • 62
6
votes
1 answer

How to achieve SHAP values for a CatBoost model in R?

I'm asked to create a SHAP analysis in R but I cannot find it how to obtain it for a CatBoost model. I can get the SHAP values of an XGBoost model with shap_values <- shap.values(xgb_model = model, X_train = train_X) but not for CatBoost. Here is…
user100740
  • 91
  • 2
4
votes
1 answer

Are linear models better when dealing with too many features? If so, why?

I had to build a classification model in order to predict which what would be the user rating by using his/her review. (I was dealing with this dataset: Trip Advisor Hotel Reviews) After some preprocessing, I compared the results of a Logistic…
3
votes
0 answers

What is the concept behind the categorical-encoding used in the CatBoost benchmark problems?

I'm working through CatBoost quality benchmark problems (here). I'm particularly intrigued by the methodology adopted to convert categorical features to numerical values as described in the comparison_description.pdf (here). What is the reasoning…
PPR
  • 171
  • 1
  • 5
2
votes
1 answer

Catboost multiclassification evaluation metric: Kappa & WKappa

I am working on an unbalanced classification problem and i want to use Kappa as my evaluation metric. Considering the classifier accepts weights (which i have given it), should i still be using weighted kappa or just use the standard kappa? I am not…
Musa
  • 31
  • 2
2
votes
0 answers

Tuning the learning rate parameter for GBDT models

I've always been taught that decreasing the learning rate parameter in gbdt models such as XGBoost, LightGBM and Catboost will improve the out-of-sample performance, assuming the number of iterations is increased accordingly and all else…
2
votes
0 answers

How to do grid search for Catboost with categorical_cols

I know it's easy to do grid search for a simple Catboost model, such as in here: https://medium.com/aiplusoau/hyperparameter-tuning-a5fe69d2a6c7 by running something like cbc = CatBoostRegressor() #create the grid grid = {'max_depth':…
Ian
  • 21
  • 3
2
votes
2 answers

RandomizedSearchcv(n_iter=10) doesnt stop after training 10 models

I am using RandomizedSearchcv for hyperparameter optimization. When I run the model, it shows the scores for each model training. The problem is, it trains way more than 10 models when in fact I expect it to train just 10 models by specifying…
2
votes
0 answers

Catboost: Categorcial Feature Encoding

I would like to understand all the methods available in Catboost for encoding categorical features. Unfortunately, the published articles by Yandex ("CatBoost: gradient boosting with categorical features support" and "CatBoost: unbiased boosting…
calpyte
  • 121
  • 2
2
votes
1 answer

How do we target-encode categorical features in multi class classification problems?

Say I have a multiclass problem with a dataset as this: user_id price target -------+--------+----- 1 30 apple 1 20 samsung 2 32 samsung 2 40 huawei . . where I have a lot of users i.e One Hot…
CutePoison
  • 520
  • 3
  • 10
2
votes
1 answer

How to tell CatBoost which feature is categorical?

I am excited to learn that CatBoost can handle categorical features by itself. One of my features, Department ID, is categorical. However, it looks like numeric, since the values are like 1001, 1002, ..., 1218. Those numbers are just IDs of the…
Fred Chang
  • 95
  • 1
  • 2
  • 6
1
vote
1 answer

Model Dump Parser (like XGBFI) for LightGBM and CatBoost

Currently my employer has multiple GLM in a live environment. I am interested in identifying new features and interactions to enhance the accuracy of these GLM; for now I am limited to the GLM structure so simply deploying a solution which…
bradS
  • 1,695
  • 9
  • 20
1
vote
1 answer

Does Gradient Boosting perform n-ary splits where n > 2?

I wonder whether algorithms such as GBM, XGBoost, CatBoost, and LightGBM perform more than two splits at a node in the decision trees? Can a node be split into 3 or more branches instead of merely binary splits? Can more than one feature be used in…
1
vote
0 answers

Feature Selection before modeling with Boosting Trees

I have read in some papers that the subset of features chosen for a boosting tree algorithm will make a big difference on the performanceso I've been trying RFE, Boruta, Clustering variables, correlation, WOE & IV and Chi-square Let's say I have a…
Mamoud
  • 11
  • 2
1
vote
2 answers

Does gradient boosting algorithm error always decrease faster and lower on training data?

I am building another XGBoost model and I'm really trying not to overfit the data. I split my data into train and test set and fit the model with early stopping based on the test-set error which results in the following loss plot: I'd say this is…
Xaume
  • 212
  • 3
  • 14
1
2 3