Retraining with the same data returns different accuracies

Question

I am using TensorFlow to train a simple neural network (3 sequential dense layers). The problem is that the accuracy changes a lot every time I retrain it from scratch. I understand that, since the weights are initialized randomly, it may not always arrive at the exact same accuracy; but, I get a range of 4% for the accuracy on the test set.

This variation makes it impossible to check if different configurations of the network or different preprocessing steps for the data work better or worse because the configuration/preprocessing is a good/bad idea, or just because I got lucky/unlucky with the random initial weights.

This is an example of accuracies on 5 consecutive train+test's. The numbers are:

Accuracy on the train set
Accuracy on the validation split (20%)
Accuracy on the test set
MSE

        1.  2.  3.  4.
Run #1: 95  91  74  20
Run #2: 94  92  75  18
Run #3: 94  91  74  20
Run #4: 94  92  73  20
Run #5: 94  91  77  17

I also find it confusing that there is no correlation between the accuracies for training, validation, and test sets.

I have tried different configurations of the ANN, longer trainings, shorter trainings, bigger and smaller validation split, different optimizers...nothing seems to give me a more stable accuracy among re-trainings. The numbers I have posted here are the best I could get.

Is a 4% range something normal? Is there a way to avoid those sub-optimal trainings? Could it be a problem related to local minima?

score 2 · Answer 1 · answered Jul 21 '21 at 14:49

I suggest to apply a nonrandom weight initialisation in order to see the impact of random initialization.

For instance, you can use the Nguyen-Widrow weight initialization.

def initnw(layer):
"""
Nguyen-Widrow initialization function
:Parameters:
    layer: core.Layer object
        Initialization layer
"""
ci = layer.ci
cn = layer.cn
w_fix = 0.7 * cn ** (1. / ci)
w_rand = np.random.rand(cn, ci) * 2 - 1
Normalize
if ci == 1:
    w_rand = w_rand / np.abs(w_rand)
else:
    w_rand = np.sqrt(1. / np.square(w_rand).sum(axis=1).reshape(cn, 1)) * w_rand
w = w_fix * w_rand
b = np.array([0]) if cn == 1 else w_fix * np.linspace(-1, 1, cn) * np.sign(w[:, 0])
Scaleble to inp_active
amin, amax  = layer.transf.inp_active
amin = -1 if amin == -np.Inf else amin
amax = 1 if amax == np.Inf else amax
x = 0.5 * (amax - amin)
y = 0.5 * (amax + amin)
w = x * w
b = x * b + y
Scaleble to inp_minmax
minmax = layer.inp_minmax.copy()
minmax[np.isneginf(minmax)] = -1
minmax[np.isinf(minmax)] = 1
x = 2. / (minmax[:, 1] - minmax[:, 0])
y = 1. - minmax[:, 1] * x
w = w * x
b = np.dot(w, y) + b
layer.np['w'][:] = w
layer.np['b'][:] = b
return

Source: https://pythonhosted.org/neurolab/_modules/neurolab/init.html

On the other hand, remember that local minimum algorithms (Gradient Descent, Adam Optimizer, etc.) used to have some stochastic behavior, for example in the definition of the starting point, or specific noise parameters like epsilon.

score 2 · Answer 2 · answered Jul 21 '21 at 14:55

2

A possible cause of the problem is that you are using the mean-squared error (MSE) as loss function for a classification problem.

Normally, for classification you would use categorical cross-entropy.

answered Jul 21 '21 at 14:55

noe

28,203
1
49
83

score 2 · Answer 3 · answered Jul 22 '21 at 01:57

If you're getting 95% accuracy on training set, but only 75% on test set, this points to serious overfitting, which none of the measures you've listed are likely to address.

It's also suspicious that validation result are so close to training, but far from test. This often happens when you change validation set during training, meaning there's effectively no validation set at all. Or if you keep training over and over too many times until obtaining the desired accuracy on validation set, which is also a recipe for overfitting.

Retraining with the same data returns different accuracies

3 Answers3

Normalize

Scaleble to inp_active

Scaleble to inp_minmax