Connect a dense layer to a LSTM architecture

Question

I am trying to implement an LSTM structure in plain numpy for didactic reason. I clearly understand how to input the data, but not how to output. Suppose I give as inputs a tensor of dimension (n, b, d) where: • n is the length of the sequence • b is the batch size (timestamps in my case) • d the number of features for each example Each example (row) in the dataset is labelled 0-1. However, when I fed the data to the LSTM, I obtain as a result the hidden state h_out which has the same dimension of the hidden size of the network. How can I obtain just a number that can be compared to my labels and properly backpropagated? I read that someone implements another dense layer on top of the LSTM, but it's not clear to me the dimensions that such layer and its weight matrix should have.

score 2 · Accepted Answer · answered Dec 31 '18 at 04:11

What you are getting as the output is the internal LSTM state. In order to get value comparable to your labels, add a dense layer on top of it. Output dimension of dense layer would be the number of labels you want result.

If its 0 and 1, only 1 output neuron can work along with sigmoid
If there are 5 label classes, then output dimension of dense layer should also be 5

Connect a dense layer to a LSTM architecture

1 Answers1