I have a binary classification task for time series data. Every 14 rows in my CSV is relevant to one time slot. How should I prepare this data to be used in LSTM? In other word how to feed the model with this data?
3 Answers
I hope that dataset also consist of meta data, which means you also need to have a one to one mapping of those tuples, eg. dog > good, cat > bad, kittens > bad, puppies > good, etc.
Separate the data into X:training_data, Y:label. Then use a vectorizer and train using X, Y. If you're able to do above steps then use methods like test_train set , cross_folds etc.
Friendly suggestion: Try seq2seq layers before LSTM (they require more resources).
Although I'm not sure about this statement "Every 14 rows in my CSV is relevant to one time slot.", as it's not cleared to me.
But if I go with your comment "How should I load this data to LSTM?So the number of column is 12 ", what I believe that you are asking how to load multiple features(in your case 12) for a time series model.
If my understanding is correct its a problem of type "Multiple Parallel Timeseries". I have created a similar model in Tensorflow and pushed in github. Github Source Code for Multiple Parallel TimeSeries
Note: Here instead of 12 features, I have used 3 features.
- 1,282
- 11
- 19
Here is the pseudo code for this:
Import pandas as pd
Import numpy as np
Data = pd.read_csv(filename)
Lag = 14
#assuming target column is last one
X=[ ]
Y = [ ]
for x in range(lag, len(data)):
X.append(data.iloc[x-lag:x,:])
Y.append(data.iloc[x,-1])
X= np.array(X)
Y = np.aaray(Y)