Using meta-algorithm is the key: How to calculate and determine outliers in time series data
What the automated outlier detection system does for each time series:
- Classifies the metric and selects a model based on that classification: Is it a “smooth time series” (stationary) or is the distribution multimodal, sparse, discrete, etc. This step is critical for the performance of the outlier detection system because the distribution determines the model, which in turn determines which algorithms can be used for determining outliers.
- Initializes that model: Read in new data points sequentially, updating and tuning that model in order to learn the normal behavior for that metric. Since a metric’s normal behavior may include seasonality, we use a proprietary algorithm, “Vivaldi” (based on autocorrelation with subsampling), to detect it. Vivaldi is extremely accurate while at the same time not computationally expensive, which allows Anodot to find outliers in a way that both eliminates false positives and performs well, even at when analyzing millions of metrics.
- Updates and refines the model: Once a data point is read, we determine if it’s an outlier. If not, that point is used to update and tune the model. If it is indeed an outlier, the system will label it as such.
If the outlier is persistent (a change to a new normal), Anodot will update the model to take into account the anomalous behavior. It starts by giving the new outlier a much smaller weight compared to a normal data point, and increases it gradually the longer it persists, until it has equal weight with non-anomalous data. This adaptability allows the system to adjust to permanent, substantial changes in the normal behavior of a metric while at the same time alerting users to the change at the moment it happens. When finding outliers in real time at the scale of millions of metrics, manual re-selection and re-tuning of a model is simply impractical.
This top-level meta-algorithm is necessary for two important reasons: for the initial model selection and for the continuous model re-selection and re-tuning as the business realities underlying the metrics change. As we’ve mentioned above, at the scale of millions of metrics, manual model selection, re-selection and tuning is impossible. Exponentially more time and money would be spent on managing such a manual system than would be spent investigating and acting on the insights generated by it.