11

I would like to train my LSTM with a "synthetic gradients" Decoupled Neural Interface (DNI).

How to decide on the number of layers and neurons for my DNI? Searching for them by trial end error or what's worse - by Genetic algorithm which would seem to outweigh [sic] defeat the purpose of Synthetic Gradients.

And, if my DNI is an LSTM itself - it seems it would take even longer to determine its optimal structure

SG speed up the training speed, by allowing multiple forward passes (with immediate weight adjustments), since DNI will already predict the future gradient.

However, we will lose time "experiencing" a few hundreds of training sessions only to find a optimal structure of DNI with which it will predict gradient the best way.

By that time we could have already finished our training with an oldschool Backprop through Time.

Also, how should we avoid our DNI overfitting, how to monitor and ensure it's not happening?

Donald S
  • 2,079
  • 3
  • 9
  • 28
Kari
  • 2,756
  • 2
  • 21
  • 51

1 Answers1

1

Several months later, I have a couple of insights on it:

Also, how should we avoid our DNI overfitting, how to monitor and ensure it's not happening?

I don't think it matters, because DNI overfitting on the gradient is actually what we want. We want it to figure-out the pattern which reduces the error in the fastest way for our data. However, as always, we should pay attention to the Validation of our entire network while doing so.

That probably means the more DNI neurons the better, as long as the designated training data stays unmodified o_O
If new or extra data is used for training, we should throw away our DNI, and just train once more, to overfit them on this new "adjusted" training-data.

Also, more on Synthetic Grads here

wish for a better answer, so won't select this one - plz post if you have a better one

Kari
  • 2,756
  • 2
  • 21
  • 51