Binary classification of every time series step based on past and future values

Question

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.

I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.

My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.

One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).

It would be great if someone can help me in the right direction because there are too many possible problems:

Wrong hyperparameters
Incorrect model
Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
Bad or too less training data
Bad features

I appreciate any help and tips.

score 11 · Answer 1 · answered May 08 '18 at 11:32

You are facing a very common problem: handling imbalanced data. For neural networks, typical procedures are:

Having the proper metrics: global accuracy should not be used.
Oversampling the minority class: randomly generate replicas of the minority class until the imbalance disappears.You can also perform data augmentation on the minority class. Synthetic data can be generated from the feature space, using SMOTE algorithm, but I don't know how it applies to neural networks.
Under-sampling the majority class: randomly remove instances of the majority class. It can deteriorate the performance on the majority class
Include class-weights in the loss function: the idea is to penalize the misclassification of the minority class. The weights are usually inversely proportional to the occurrence frequency of each class.
Using different learning rates per class: you can use a bigger learning rate for the majority class, thus the net stops learning earlier from the majority class than from the minority class

I'd recommend a combination of 1 (always), 2 and 4. For a higher insight in this topic, which is of very importance, I recommend reading:

Binary classification of every time series step based on past and future values

1 Answers1

Linked