Questions tagged [stacked-lstm]
15 questions
16
votes
2 answers
Dropout on which layers of LSTM?
Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner…
BigBadMe
- 760
- 1
- 7
- 19
4
votes
1 answer
Connect a dense layer to a LSTM architecture
I am trying to implement an LSTM structure in plain numpy for didactic reason. I clearly understand how to input the data, but not how to output.
Suppose I give as inputs a tensor of dimension (n, b, d) where:
• n is the length of the…
Alexbrini
- 77
- 7
4
votes
1 answer
How is error back-propagated in a multi-layer RNN
Let's say I have a 2 layer LSTM cell, and I'm using this network to perform regression for input sequences of length 10 along the time axis.
From what I understand, when this network is 'unfolded', it will consist of 20 LSTM cells, 10 for each…
Ali250
- 213
- 2
- 5
2
votes
0 answers
Wiggle in the initial part of an LSTM prediction
I working on using LSTMs and GRUs to make time series predictions. For the most part the predictions are pretty good.
However, there seems to be a wiggle (or initial up-then-down) before the prediction settles out similar to the left side of this…
AGirlHasNoUsername
- 21
- 1
2
votes
0 answers
How multi layer LSTM are interconnected?
I am trying to understand the layers in LSTM for my own implementation using Python. I started with Keras to getting familiarized with the layer flow.
I have tried the below code in Keras and I have the observations as follows
# LSTM…
felix Antony
- 121
- 2
2
votes
1 answer
Stacking LSTM layers
Can someone please tell me the difference between those stacked LSTM layers?
First image is given in this question and second image is given in this article. So far what I learned about stacking LSTM layers was based on the second image. When you…
ARAT
- 273
- 4
- 13
1
vote
0 answers
What is the reason behind Keras choice of default (recurrent) activation functions in LSTM networks
Activation function between LSTM layers
In the above link, the answer to the question whether activation function are required for LSTM layers was answered as follows: as an LSTM unit already consists of multiple non-linear activation functions, it…
Lauramvp
- 11
- 1
1
vote
0 answers
LSTM for multiple time series regression with extremely large ranges
I have the following question for those which encountered the same dilemma as me:
My target is to develop a LSTM RNN for multi-step prediction for multiple time series representing daily sales of different products. The problem that I face is that…
user86335
- 19
- 2
1
vote
1 answer
What are h(t-1) and c(t-1) for the first LSTM cell?
I know in a LSTM chain you should connect the h(t) of the previous cell to the h(t+1) of the next cell, and doing so for c(t). But what about the first cell? What does it get as h(t-1) and c(t-1)?
I also like to know, if we want to make a multi…
user3486308
- 1,310
- 5
- 19
- 29
1
vote
0 answers
Splitting and training multiple datasets at the same time
I've got 15 different datasets at about 10GB each. Each dataset comes with a binary 2D ground truth (10486147ish, 1) that I pull from it. I'm trying to figure out how to load each dataset, split them all with scikitlearn's train_test_split, then…
Hexadecimalism
- 21
- 2
1
vote
0 answers
Initialising states in a multilayer sequence to sequence model
With a sequence to sequence model where the enocoder and decoder are both comprised of one layer each, the initial state of the decoder is initialised to use the final states of the encoder layer.
In the case of a multi-layer sequence to sequence…
Aesir
- 458
- 1
- 6
- 15
1
vote
0 answers
LSTM Produces Random Predictions
I have trained an LSTM in PyTorch on financial data where a series of 14 values predicts the 15th. I split the data into Train, Test, and Validation sets. I trained the model until the loss stabilized. Everything looked good to me when using the…
Skiddles
- 1,018
- 5
- 12
1
vote
1 answer
Why does my LSTM perform better when randomizing training subset vs. standard batch training?
I am training a simple LSTM network using Keras to predict time series values. It is a simple 2-layer LSTM.
I get the best performance when I train on subsets of the training set that start at random points. Each subset has a training size of 100…
Mati K
- 95
- 1
- 10
0
votes
1 answer
How can I detect anomalies/outliers in my online streaming data on a real-time basis?
Say, I've a huge set of data(infinite in size) consisting of alternating sine wave and step pulses one after the other. What I want from my model is to parse the data sequence wise or point wise and the first time it parses a sine wave and starts…
Goutam Bose
- 1
- 1
0
votes
0 answers
Training by multivariate data sets
I have the following task to do: Training by the consecutive 3 days to predict the 4th day. Each day data represents one CSV file which has dimension 24x25. Every datapoints of each CSV file are pixels. I have required to use the models like…
S. M.
- 125
- 17