I am a newbie to data science with a typical problem. I have a data set with metric1, metric2 and metric3. All these metrics are interdependent on each other. I want to detect anomalies in metric3. Currently, I am using Nupic from numenta.org for my analysis and it doesn't seem to be effective. Is there any ML library which can detect anomalies in multiple parameters?
2 Answers
One way to use both metric1 and metric2 in order to find anomalies in metric3 is to consider residual analysis.
In your case, this would require, creating a predictive model with metric1 and metric2 as the predictors and metric3 as the response variable.
Then, calculate the residuals for metric3 as its predicted value subtracted from its true value. Now, you can report the all members of the lowest decile [or any other percentile] as one kind of an anomaly and all the members of the highest decile [or any other percentile] as another kind of an anomaly.
- 1,625
- 1
- 12
- 22
If you label metric 3 as $x_3 = \{1,0\}$, where $1$ means it is an anomaly, this becomes a logistic regression problem where $\mathbb{P}(X_3 = 1) = logit(\beta_0 + \beta_1 x_1 + \beta_2 x_2)$.
- 1