Binary classification works with softmax, but not sigmoid

Question

I am doing a binary classification problem for seizure classification. I split the data into Training, Validation and Test with the following sizes and shapes dataset_X = (154182, 32, 9, 19), dataset_y = (154182, 1).

The unique values for dataset_y are array([0, 1]), array([77127, 77055]) Then the data is split into to become 92508, 30837 and 30837 for Training, Validation and Testing respectively.

The configuration using Categorical_CrossEntropy with a final dense layer with size of 2 and softmax activation function works very well. However, if I tried to used Binary_CrossEntropy with a final dense layer with size of 1 and sigmoid activation function, the training and validation phase reports almost the same results, but when predicting on test dataset, it is totally messed up.

For the softmax model:

The Model:

def create_cnn_model(X_train_shape, nb_classes):
inputs = Input(shape=X_train_shape[1:])

normal1 = BatchNormalization(axis=-1)(inputs)
reshape1 = Lambda(lambda x: keras.backend.expand_dims(x, axis=-1))(normal1)
conv1 = Convolution3D(
    32, (3 ,3, X_train_shape[-1]), data_format = 'channels_last',
    padding='valid', strides=(1,1,1))(reshape1)

reshape2 = Lambda(lambda x: keras.backend.squeeze(x, axis=-2))(conv1)

relu1 = Activation('relu')(reshape2)
pool1 = MaxPooling2D(pool_size=(2, 1), data_format = 'channels_last')(relu1)

normal2 = BatchNormalization(axis=-1)(pool1)

conv2 = Convolution2D(
    64, (3, 3), data_format = 'channels_last',
    padding='valid', strides=(1,1))(normal2)
relu2 = Activation('relu')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 1), data_format = 'channels_last')(relu2)

normal3 = BatchNormalization(axis=-1)(pool2)


conv3 = Convolution2D(
    64, (3, 3), data_format = 'channels_last',
    padding='valid', strides=(1,1))(normal3)
relu3 = Activation('relu')(conv3)

flat = Flatten()(relu3)
drop1 = Dropout(0.5)(flat)
dens1 = Dense(256, activation='relu')(drop1)
drop2 = Dropout(0.5)(dens1)
dens2 = Dense(nb_classes)(drop2)

last = Activation('softmax')(dens2)


model = Model(inputs=inputs, outputs=last)
return model

The functions that create the model and initiates the training

        cnn_model = create_cnn_model(X_train.shape, nb_classes)
        adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
        cnn_model.compile(loss='categorical_crossentropy', 
                    optimizer=adam, 
                    metrics=['accuracy', 'Recall''Precision','AUC'])
        Y_train = Y_train.astype('uint8')
        Y_train = np_utils.to_categorical(Y_train, nb_classes)
        Y_val = np_utils.to_categorical(Y_val, nb_classes)
    cnn_model.fit(X_train, Y_train, batch_size=32, epochs=10, validation_data=(X_val,Y_val))

    predictions = cnn_model.predict(X_test, verbose=1)
    y_pred = np_utils.to_categorical(np.argmax(predictions, axis=1), nb_classes)
    y_true = np_utils.to_categorical(Y_test, nb_classes)

    #Converting categorical to numerical
    y_pred_s = y_pred.argmax(1)
    y_true_s = y_true.argmax(1)

    print(np.unique(y_pred_s, return_counts=True))
    print(np.unique(y_true_s, return_counts=True))

    print(y_pred.shape, y_true.shape)
    from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, roc_auc_score
    acc_scr = accuracy_score(y_true, y_pred)
    pre_scr = precision_score(y_true, y_pred, average='micro')
    rec_scr = recall_score(y_true, y_pred, average='micro')
    roc_auc_score = roc_auc_score(y_true, y_pred, average='micro')

    f1_test = f1_score(y_true, y_pred, average='weighted')

The training details and testing results after 10 epochs:

Shape: x_train, y_train, X_val, y_val
(92508, 32, 9, 19) (92508, 2) (92508, 32, 9, 19) (30837, 2)
Epoch 1/10
2891/2891 [==============================] - 63s 19ms/step - loss: 0.8718 - accuracy: 0.8860 - recall: 0.8860 - precision: 0.8860 - auc: 0.9474 - val_loss: 0.1635 - val_accuracy: 0.9414 - val_recall: 0.9414 - val_precision: 0.9414 - val_auc: 0.9824
Epoch 2/10
2891/2891 [==============================] - 53s 18ms/step - loss: 0.3728 - accuracy: 0.9361 - recall: 0.9361 - precision: 0.9361 - auc: 0.9813 - val_loss: 0.1891 - val_accuracy: 0.9251 - val_recall: 0.9251 - val_precision: 0.9251 - val_auc: 0.9791
...
Epoch 10/10
2891/2891 [==============================] - 48s 17ms/step - loss: 0.1377 - accuracy: 0.9774 - recall: 0.9774 - precision: 0.9774 - auc: 0.9967 - val_loss: 0.0354 - val_accuracy: 0.9864 - val_recall: 0.9864 - val_precision: 0.9864 - val_auc: 0.9986
964/964 [==============================] - 3s 3ms/step
Shape: X_test, y_test, y_pred
(30837, 32, 9, 19) (30837, 2) (30837, 2)
Accuracy:  0.9854719979245712
Recall:  0.9854719979245712
Precision:  0.9854719979245712
ROC AUC:  0.9854719979245712

For the sigmoid model:

The Model: It is the same model as above but with the following changes:

    dens2 = Dense(1)(drop2)
    last = Activation('sigmoid')(dens2)

The functions that create the model and initiates the training

        cnn_model = create_cnn_model(X_train.shape, nb_classes) #nb_classes is useless here
        adam = Adam(lr=1e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
        cnn_model.compile(loss='binary_crossentropy', 
                    optimizer=adam, 
                    metrics=['accuracy', 'Recall', 'Precision','AUC'])
        cnn_model.fit(X_train, Y_train, batch_size=32, epochs=10, validation_data=(X_val,Y_val))
    predictions = cnn_model.predict(X_test, verbose=1)
    y_pred = np.argmax(predictions)
    y_true = Y_test

    print(y_pred.shape, y_true.shape)
    from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, roc_auc_score
    acc_scr = accuracy_score(y_true, y_pred)
    pre_scr = precision_score(y_true, y_pred)
    rec_scr = recall_score(y_true, y_pred)
    roc_auc_score = roc_auc_score(y_true, y_pred)
    f1_test = f1_score(y_true, y_pred, average='weighted')

The training details and testing results after 10 epochs:

Shape: x_train, y_train, X_val, y_val
(92508, 32, 9, 19) (92508, 1) (30837, 32, 9, 19) (30837, 1)
Epoch 1/10
2891/2891 [==============================] - ETA: 0s - loss: 0.0284 - accuracy: 0.9920 - recall: 0.2655 - precision: 0.5381 - auc: 0.9277
2891/2891 [==============================] - 80s 24ms/step - loss: 0.0284 - accuracy: 0.9920 - recall: 0.2655 - precision: 0.5381 - auc: 0.9277 -  val_loss: 0.0156 - val_accuracy: 0.9955 - val_recall: 0.5370 - val_precision: 0.8734 - val_auc: 0.9432 
Epoch 2/10
2891/2891 [==============================] - ETA: 0s - loss: 0.0129 - accuracy: 0.9959 - recall: 0.6269 - precision: 0.8476 - auc: 0.9800
2891/2891 [==============================] - 60s 21ms/step - loss: 0.0129 - accuracy: 0.9959 - recall: 0.6269 - precision: 0.8476 - auc: 0.9800 - val_loss: 0.0079 - val_accuracy: 0.9974 - val_recall: 0.7860 - val_precision: 0.8899 - val_auc: 0.9873 
...
Epoch 10/10
2891/2891 [==============================] - 50s 17ms/step - loss: 0.0853 - accuracy: 0.9660 - recall: 0.9665 - precision: 0.9655 - auc: 0.9952 - val_loss: 0.0865 - val_accuracy: 0.9648 - val_recall: 0.9615 - val_precision: 0.9679 - val_auc: 0.9949
964/964 [==============================] - 3s 3ms/step
Shape: X_test, y_test, y_pred
(30837, 32, 9, 19) (30837, 1) (30837,)
Accuracy:  0.5002432143204592
Recall:  0.0
Precision:  0.0
ROC AUC:  0.5
F1-weighted score: 0.33360360651524557

When printing the y_true and y_pred arrays after running the softmax model, after being converted from categorical to numerical, I get:

y_true:(array([0, 1], dtype=int64), array([15426, 15411], dtype=int64))

y_pred: (array([0, 1], dtype=int64), array([15360, 15477], dtype=int64))

However, when I run the same for the sigmoid model, I get:

y_true: (array([0, 1], dtype=uint8), array([15426, 15411], dtype=int64))

y_pred: (array([0], dtype=int64), array([30837], dtype=int64))

it is apparent that not a single '1' label is predicted. This justifies the scores above. So what does cause this behavior and how to fix it?

Thank you

Binary classification works with softmax, but not sigmoid

0 Answers0