3

I have a binary classification task for time series data. Every 14 rows in my CSV is relevant to one time slot. How should I prepare this data to be used in LSTM? In other word how to feed the model with this data?

Kaggle
  • 2,977
  • 5
  • 15
  • 8

3 Answers3

0

I hope that dataset also consist of meta data, which means you also need to have a one to one mapping of those tuples, eg. dog > good, cat > bad, kittens > bad, puppies > good, etc.

Separate the data into X:training_data, Y:label. Then use a vectorizer and train using X, Y. If you're able to do above steps then use methods like test_train set , cross_folds etc.

Friendly suggestion: Try seq2seq layers before LSTM (they require more resources).

lmjohns3
  • 588
  • 6
  • 19
yunus
  • 101
  • 1
0

Although I'm not sure about this statement "Every 14 rows in my CSV is relevant to one time slot.", as it's not cleared to me.

But if I go with your comment "How should I load this data to LSTM?So the number of column is 12 ", what I believe that you are asking how to load multiple features(in your case 12) for a time series model.

If my understanding is correct its a problem of type "Multiple Parallel Timeseries". I have created a similar model in Tensorflow and pushed in github. Github Source Code for Multiple Parallel TimeSeries

Note: Here instead of 12 features, I have used 3 features.

vipin bansal
  • 1,282
  • 11
  • 19
0

Here is the pseudo code for this:

Import pandas as pd
Import numpy as np

Data = pd.read_csv(filename)
Lag = 14
#assuming target column is last one
X=[ ]
Y = [ ]
for x in range(lag, len(data)):
     X.append(data.iloc[x-lag:x,:])
     Y.append(data.iloc[x,-1])
X= np.array(X)
Y = np.aaray(Y)
Peter
  • 7,896
  • 5
  • 23
  • 50
Adeetya
  • 11
  • 1