how does minibatch for LSTM look like?

Question

Minibatch is a collection of examples that are fed into the network, (example after example), and back-prop is done after every single example. We then take average of these gradients and update our weights. This completes processing 1 minibatch.

I read these posts

[1]
[2],
about padding entries in a minibatch so they have same length
and about preserving the cell state but the following is still unclear to me:

Question part a:

How a minibatch entity would look like for LSTM? Say, I want it to reproduce Shakespeare, letter by letter (30 characters to choose from).

I launch LSTM, let it predict for 200 characters of a poem, then perform back propagation. (hence, my LSTM works with 200 timesteps). Does this mean my minibatch consist of 1 example whose length is 200?

Question part b:

If I wanted to launch 63 other minibatches in parallel, would I just pick 63 extra poems? (Edit: Original answer doesn't mention this explicitly, but we don't train minibatches in parallel. We train on 1 minibatch, but train its examples in parallel)

Question part C:

If I wanted each minibatch to consist of 10 different examples, what would such examples be, and how would they be different from 'what I perceive as a minibatch'?

score 3 · Accepted Answer · answered Dec 28 '17 at 14:16

I think you need to distinguish between training and execution of the model. During training, you can use batches, which in your case will be different fragments from Shakespeare. So, a batch will be a list of fragments, and the language model will start from the first character on each element of the batch and do the backward and forward pass.

When you execute the model, once it is trained, you would like to see one single example, in which case you can set the batch size to one.

I believe this answers your three questions.

how does minibatch for LSTM look like?

1 Answers1