4

I am using gridsearchcv to tune the parameters of my model and I also use pipeline and cross-validation. When I run the model to tune the parameter of XGBoost, it returns nan. However, when I use the same code for other classifiers like random forest, it works and it returns complete results.

kf = StratifiedKFold(n_splits=10, shuffle=False)

SCORING = ['accuracy', 'precision', 'recall', 'f1' ]

define parametres for hypertuning

params = { 'Classifier__n_estimators': [5, 10, 20, 50, 100, 200] }

XGB = XGBClassifier() UnSam = RepeatedEditedNearestNeighbours()

pipe = Pipeline(steps=[('UnderSampling', UnSam ), ('Classifier', XGB)])

___________________________________________

mod = GridSearchCV(pipe, params, cv =kf, scoring = SCORING, refit='f1', return_train_score=True) mod.fit(X_train, y_train)

Here is my code and when I run it, the following results are obtained:

{'Classifier__n_estimators': 5}
__________________________________________________
F1 :  [nan nan nan nan nan nan] 
 Recall :  [nan nan nan nan nan nan] 
 Accuracy :  [nan nan nan nan nan nan] 
 Precision :  [nan nan nan nan nan nan]

Another thing that is weird is that when I apply the same code for tunning the penalty in Logistics Regression, it returns nan for l1 and elasticnet.

kf = StratifiedKFold(n_splits=10, shuffle=False)

SCORING = ['accuracy', 'precision', 'recall', 'f1' ]

define parametres for hypertuning

params = { 'Classifier__penalty': ['l1','l2','elasticnet'] }

LR = LogisticRegression(random_state=0) UnSam = RepeatedEditedNearestNeighbours()

pipe = Pipeline(steps=[('UnderSampling', UnSam ), ('Classifier', LR)])

___________________________________________

mod = GridSearchCV(pipe, params, cv =kf, scoring = SCORING, refit='f1', return_train_score=True) mod.fit(X_train, y_train)

The results are as follows:

{'Classifier__penalty': 'l2'}
__________________________________________________
F1 :  [  nan 0.363   nan] 
 Recall :  [   nan 0.4188    nan] 
 Accuracy :  [   nan 0.7809    nan] 
 Precision :  [   nan 0.3215    nan]
Aaron
  • 231
  • 1
  • 3
  • 9

2 Answers2

10

By default, GridSearchCV provides a score of nan when fitting the model fails. You can change that behavior and raise an error by setting the parameter error_score="raise", or you can try fitting a single model to get the error. You can then use the traceback to help figure out where the problem is.

For the LogisticRegression, I can identify the likely culprit: the default solver is lbfgs, which cannot handle L1 or ElasticNet penalty. Use saga.

I don't immediately see an issue with the XGBoost model or parameters. Get the error traceback using the first paragraph, and search/ask that as a separate question if needed.

Ben Reiniger
  • 12,855
  • 3
  • 20
  • 63
0

Building upon this discussion, I encountered a similar issue when using the Random Search optimizer with XGBoost. Specifically, when loading the XGBClassifier model into a scikit-learn Pipeline and then passing it to a RandomizedSearchCV instance as an estimator (with error_score="raise" to avoid the scores being assigned as np.nan), I encountered the following error in addition to the warning highlighted in XGBoost sklearn_tags Method Error in Python When Loading Model:

Pipeline should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.

Possible solutions I've found include:

  1. Downgrading scikit-learn version to 1.5.2: I have observed that the issues I encountered seem to be version-dependent. Specifically, scikit-learn versions between 1.5.3 and 1.6.0 (the current Colab version, as of writing) all exhibit these errors on Colab. A potential temporary fix appears to be downgrading to scikit-learn version 1.5.2.
!pip install scikit-learn==1.5.2
  1. Building a custom XGBClassifier class: As stated in Incompability between scikit-learn and xgboost, building a custom XGBClassifier wrapper class is also a viable solution, along with other solutions mentioned in the linked discussions.

PS: I'm not sure if other tree-based classifiers suffer from this similar issue, but I checked with ExtraTreesClassifier and it worked fine.