Questions tagged [gru]

22 questions
184
votes
6 answers

When to use GRU over LSTM?

The key difference between a GRU and an LSTM is that a GRU has two gates (reset and update gates) whereas an LSTM has three gates (namely input, output and forget gates). Why do we make use of GRU when we clearly have more control on the network…
Sayali Sonawane
  • 2,101
  • 3
  • 13
  • 13
8
votes
1 answer

TensorFlow / Keras: What is stateful = True in LSTM layers?

Could you elaborate on this argument? I found the brief explanation from the docs unsatisfying: stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i…
Leevo
  • 6,445
  • 3
  • 18
  • 52
4
votes
1 answer

RNN performing worse than random guessing on large dataset

I have to start off by saying I am 100% a beginner here. I trained a RNN model on a 30 class dataset with over 90000 samples and it achieved less than 2% accuracy. Training the same model on a small subset of the same data (with only 3 classes), the…
adithom
  • 41
  • 1
2
votes
1 answer

GRU and LSTM does not "take risk" predicting

I tested LSTM and GRU models to predict the exchange rate between currencies. I do not take the raw price but a the delta with the previous day, so the data is stationnary around zero. My problem is that my model always predict really close-to-zero…
alarty
  • 21
  • 1
2
votes
0 answers

Custom GRU With 3D Spatial Convolution Layer In Keras

I am trying to implement a custom GRU model that is shown in this paper 3D-R2N2 The GRU pipeline looks like: The original implementation is theano based and I am trying to apply the model in tf2/Keras. I have tried to create a custom GRU Cell from…
b15h0y
  • 21
  • 3
2
votes
1 answer

Impact of varying sequence length in ensemble GRU model

I am using ensemble gru for my project and keeping different cell sizes for different models !For example, first gru model is of size 16 and the second is of 8 and 4 for the third model. The model is running well but I don't see any difference in…
2
votes
0 answers

Wiggle in the initial part of an LSTM prediction

I working on using LSTMs and GRUs to make time series predictions. For the most part the predictions are pretty good. However, there seems to be a wiggle (or initial up-then-down) before the prediction settles out similar to the left side of this…
2
votes
0 answers

GRU learns small-scale features, but misses large scales

Playing around with weather data, I have set up a simple RNN with one layer of GRUs. It is trained to recover the temperature of the next day, given weather data of the last 5 days, each with 1-hour intervals. What I find peculiar is that after…
rugermini
  • 21
  • 2
2
votes
1 answer

Training 3 models in different order gave different results

I have the following loop to train some models on a time series. my_seed = 7 time_frames = [4,5] layers = [3,4,5] ----- basic data formating, always gives the same output ------- x1 = numpy.concatenate((x1,x2), axis=0) y1 =…
Stefan
  • 21
  • 1
2
votes
0 answers

TF: What is the difference between the 'kernel weights' and the 'recurrent kernel weights' in LSTMs/GRUs?

Context: I am trying to understand the differences between the GRU/LSTM cells from tensorflow and pytorch (for research reproducibility) and noticed that TensorFlow differentiates between the kernel_initializer and the recurrent_initializer (see…
1
vote
1 answer

How to add a Decoder & Attention Layer to Bidirectional Encoder with tensorflow 2.0

I am a beginner in machine learning and I'm trying to create a spelling correction model that spell checks for a small amount of vocab (approximately 1000 phrases). Currently, I am refering to the tensorflow 2.0 tutorials for 1. NMT with Attention,…
Dom
  • 11
  • 2
1
vote
0 answers

LSTM / GRU prediction with hidden state?

I am trying to predict a value based on time series by series of 24 periods (the 25th period) While training I have a validation set with I babysit the training (RMSE) and each epoch, eval the validation: Receive errors as: Train RMSE:…
1
vote
1 answer

Using GRU with FeedForward layers in Python

I'm trying to reproduce the codes in this paper here for the multi-labeling problem (11 classes), which is using 1- Embedding layer 2- GRU 3- two Feed forward Layers with the ReLU activation function 4- sigmoid unit. I've tried to run the…
Zahra Hnn
  • 33
  • 4
1
vote
1 answer

Keras RNN (batch_size

I created an RNN model for text classification with the LSTM layer, but when I put the batch_size in the fit method, my model trained on the whole batch instead of just the mini-batch _size. This also happened when I used GRU and Bidirectional layer…
cho_uc
  • 38
  • 4
1
vote
1 answer

Aside from trial and error, how do I select the number of layers and unit counts for LSTMS, GRUs, and Transformer units for text and time series?

When deciding on the number of units and layers for text processing or time-series prediction I rely heavily on trial and error. First, I look for a reference or paper on the topic such as the white paper on transformers: Attention Is All You Need.…
1
2