Why GridSearchCV returns nan?

Question

I am using gridsearchcv to tune the parameters of my model and I also use pipeline and cross-validation. When I run the model to tune the parameter of XGBoost, it returns nan. However, when I use the same code for other classifiers like random forest, it works and it returns complete results.

kf = StratifiedKFold(n_splits=10, shuffle=False)
SCORING = ['accuracy', 'precision', 'recall', 'f1' ]
define parametres for hypertuning
params = {
    'Classifier__n_estimators': [5, 10, 20, 50, 100, 200]
}
XGB = XGBClassifier()
UnSam = RepeatedEditedNearestNeighbours()
pipe = Pipeline(steps=[('UnderSampling', UnSam ), ('Classifier', XGB)])
___________________________________________
mod = GridSearchCV(pipe, params, cv =kf, scoring = SCORING, refit='f1', return_train_score=True)
mod.fit(X_train, y_train)

Here is my code and when I run it, the following results are obtained:

{'Classifier__n_estimators': 5}
__________________________________________________
F1 :  [nan nan nan nan nan nan] 
 Recall :  [nan nan nan nan nan nan] 
 Accuracy :  [nan nan nan nan nan nan] 
 Precision :  [nan nan nan nan nan nan]

Another thing that is weird is that when I apply the same code for tunning the penalty in Logistics Regression, it returns nan for l1 and elasticnet.

kf = StratifiedKFold(n_splits=10, shuffle=False)
SCORING = ['accuracy', 'precision', 'recall', 'f1' ]
define parametres for hypertuning
params = {
    'Classifier__penalty': ['l1','l2','elasticnet']
}
LR = LogisticRegression(random_state=0)
UnSam = RepeatedEditedNearestNeighbours()
pipe = Pipeline(steps=[('UnderSampling', UnSam ), ('Classifier', LR)])
___________________________________________
mod = GridSearchCV(pipe, params, cv =kf, scoring = SCORING, refit='f1', return_train_score=True)
mod.fit(X_train, y_train)

The results are as follows:

{'Classifier__penalty': 'l2'}
__________________________________________________
F1 :  [  nan 0.363   nan] 
 Recall :  [   nan 0.4188    nan] 
 Accuracy :  [   nan 0.7809    nan] 
 Precision :  [   nan 0.3215    nan]

score 10 · Answer 1 · answered Mar 27 '21 at 03:53

By default, GridSearchCV provides a score of nan when fitting the model fails. You can change that behavior and raise an error by setting the parameter error_score="raise", or you can try fitting a single model to get the error. You can then use the traceback to help figure out where the problem is.

For the LogisticRegression, I can identify the likely culprit: the default solver is lbfgs, which cannot handle L1 or ElasticNet penalty. Use saga.

I don't immediately see an issue with the XGBoost model or parameters. Get the error traceback using the first paragraph, and search/ask that as a separate question if needed.

score 0 · Answer 2 · answered Dec 23 '24 at 13:29

Building upon this discussion, I encountered a similar issue when using the Random Search optimizer with XGBoost. Specifically, when loading the XGBClassifier model into a scikit-learn Pipeline and then passing it to a RandomizedSearchCV instance as an estimator (with error_score="raise" to avoid the scores being assigned as np.nan), I encountered the following error in addition to the warning highlighted in XGBoost sklearn_tags Method Error in Python When Loading Model:

Pipeline should either be a classifier to be used with response_method=predict_proba or the response_method should be 'predict'. Got a regressor with response_method=predict_proba instead.

Possible solutions I've found include:

Downgrading scikit-learn version to 1.5.2: I have observed that the issues I encountered seem to be version-dependent. Specifically, scikit-learn versions between 1.5.3 and 1.6.0 (the current Colab version, as of writing) all exhibit these errors on Colab. A potential temporary fix appears to be downgrading to scikit-learn version 1.5.2.

!pip install scikit-learn==1.5.2

Building a custom XGBClassifier class: As stated in Incompability between scikit-learn and xgboost, building a custom XGBClassifier wrapper class is also a viable solution, along with other solutions mentioned in the linked discussions.

PS: I'm not sure if other tree-based classifiers suffer from this similar issue, but I checked with ExtraTreesClassifier and it worked fine.

Why GridSearchCV returns nan?

define parametres for hypertuning

___________________________________________

define parametres for hypertuning

___________________________________________

2 Answers2

Linked