Val loss initially decreases, then increases

Question

I've created an LSTM model to predict 1 output value from 8 features. My loss constantly decreases and my val loss also decreases from the start, however it begins to increase after so many epochs. Here's a picture of what's going on.

Also here is my code:

file = r'/content/drive/MyDrive/only force/only_force_pt1.csv'
df = pd.read_csv(file)
df.head()
X = df.iloc[:,1:9]
y = df.iloc[:,9]
#X.head()
print(type(X))
WIN_LEN = 5
def window_size(size, inputdata, targetdata):
  X = []
  y = []
  i = 0
  while(i + size) <= len(inputdata) - 1:
    X.append(inputdata[i: i + size])
    y.append(targetdata[i + size])
    i += 1
  assert len(X) == len(y)
  return (X, y)
X_series, y_series = window_size(WIN_LEN, X, y)
data_split = int(len(X_series)*0.8)
X_train, X_test = X_series[:data_split], X_series[data_split:]
y_train, y_test = y_series[:data_split], y_series[data_split:]
n_timesteps, n_features, n_outputs = np.array(X_train).shape[1], np.array(X_train).shape[2], 1
X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)
[verbose, epochs, batch_size] = [1, 500, 32]
input_shape = (n_timesteps, n_features)
model = Sequential()
model.add(LSTM(64,input_shape = input_shape,return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dropout(0.2))
model.add(Dense(32, activation = 'relu', kernel_regularizer=keras.regularizers.l2(0.001)))
model.add(Dense(1))
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience = 60, verbose =1, mode = 'auto')
model.summary()
model.compile(loss = 'mse', optimizer = Adam(learning_rate = 0.00005), metrics = [tf.keras.metrics.RootMeanSquaredError()])
history = model.fit(X_train, y_train, batch_size = batch_size, epochs=epochs, verbose=verbose, validation_data=(X_test, y_test), callbacks = [earlystopper],shuffle = True)

I do get much better results when I use train_test_split and shuffle my training and testing data, however that leads to major overfitting problems. Also, I'm using time series data, so I don't want to shuffle anyways.

Does anyone have any suggestions?

buddemat · Answer 1 · 2022-09-02T18:10:06.753

The point where the validation loss starts to grow is where the training starts to overfit the model, i.e. it is memorizing your training data and getting worse at generalizing to new data.

There are multiple things that can be done to combat overfitting, among them:

early stopping, i.e. checking when the validation loss starts to increase and stop training / restore the best fitting model. You actually have that in your code, but a patience of 60 seems way to high imho.
adding dropout layers, which will randomly remove certain features by setting them to zero. Again, there is dropout in your code, so you may want to increase the parameter to 0.5 for droupout after the dense layers. Also check this post for a discussion on where to place the dropout layers. You could try removing the one after the LSTM and place one after the second dense layer.
adding regularization, i.e. penalizing large weights in the loss function. Your code also has L2 regularization, so you might try increasing the parameter to 0.01.
reducing the batch size may help, since larger sizes tend to have a negative impact on generalization.

There is more you can play with, like modifying your learning rate, but I suggest you generally familiarize yourself with overfitting in general and for LSTMs in particular.

score 0 · Answer 2 · answered Sep 02 '22 at 21:15

Overfitting in recurrent NN has been an unique issue for a little while now. Aside from what's already been posted, the simplest thing you can try is increase the drop out of the LSTM layer to closer to 0.5. (https://arxiv.org/abs/1512.05287, https://arxiv.org/abs/1409.2329)

It looks like you are working with time series, so you can also try to split the data in a rolling origin fashion. ScikitLearn has an implementation of it here And I think not shuffling is the right idea. Good luck!

Val loss initially decreases, then increases

2 Answers2