0

I am building an anomaly detection system of event sequence data (transactions). For each timestep, a transaction can be in any of 76 different stages.

My dataset is therefore a 3D array of size(m,t,N).

m is number of samples (~40 000), t = the length of the longest transaction (I am zero padding all other sequences of shorter length), N is the number of possible events at each timestep. $x_{m,t,N} = \{0,1\}$ where $x_{v,u,w} = 1$ if transaction v is in event w at timestep u otherwise it's 0.

So far I have created an LSTM Variational Auto Encoder in Python with Keras and doing outlier detection by applying Local Outlifer Factor on the latent representations of the transactions. It's based on this paper https://arxiv.org/abs/1906.10896 and following various blogs such as https://blog.keras.io/building-autoencoders-in-keras.html.

I would like to compare the results with other techniques. I was wondering if anyone has any other suggestions on algorithms and techniques for anomaly detection in unlabeled event sequence data?

kspr
  • 133
  • 4

1 Answers1

1

A simple comparison is to use the reconstruction error as the anomaly score. This can be done instead of, or in addition to, the Local Outlier Factor that you have now. You can also try other ways of computing the anomaly score from the latent feature vector. Some popular alternatives are:

  • Mahalanobis distance on the
  • Fitting a Gaussian Mixture Model, and use the probability density function

A completely different approach would be to use a self-supervised training setup instead of the autoencoder. Such as Skip-Gram or predicting future time-frame.

Jon Nordby
  • 1,557
  • 10
  • 14