• API

›Multivariate

Forecasting

  • Autoregressive Neural Network (AR_net)
  • Quadratic Model
  • Linear Model
  • KatsEnsemble
  • Empirical Confidence Interval
  • STLF
  • Theta
  • Holt-Winter’s
  • Prophet
  • SARIMA
  • ARIMA

Detection

  • BOCPD: Residual Translation
  • BOCPD: Bayesian Online Changepoint Detection
  • Outlier Detection
  • ACFDetector
  • Seasonality Detector
  • Cusum Detector

TSFeatures

  • TsFeatures

Multivariate

  • Multivariate Outlier Detection
  • VAR

Utilities

  • Model Hyperparameter Tuning
  • Backtesting
  • Time Series Decomposition
  • Dataswarm Operators

Multivariate Outlier Detection

This anomaly detection method is useful to detect anomalies across multiple time series. Anomalies are detected based on deviations from the predicted steady state behavior. The steady state behavior of a system of metrics is predicted by modeling the linear interdependencies between time-series using a VAR model. This approach is especially suited for detecting multivariate anomalies - small anomalies but persistent across a large number of time series.

In addition to identifying an anomalous event, this method has useful utilities to flag specific time series that were affected for a high level root cause analysis. For more details about the approach, please refer to this note.

API:

class MultivariateAnomalyDetector(data, params, training_days)

Parameters:

data: TimeSeriesData - Note that data should be deseasonalized and detrended prior
       to detection
params: [VARParams](https://fb.quip.com/iYpgAq8zh1x4) class initiated with appropriate parameters
        for the VAR model training
training_days: Initial number of days (can be a fraction) to use for training the model.
               As a result, the first selected number of data points will be excluded
               from the results.

Methods

detector():
# fit VAR model and calculate overall and individual anomaly scores
Returns:
DataFrame with each column representing the overall anomaly score and individual scores
of each timeseries

plot():
# Plot the timeseries metrics and overall anomaly score at each timesteps.
# Useful for choosing a threshold on the overall anomaly score

get_anomaly_timepoints(threshold):
# Helper function to returns list of time instants when anomaly was
# detected based on the chosen threshold
Args:
    threshold: Threshold on the overall anomaly score

get_anomalous_metrics(t, top_k):
# Helper function to get 'top_k' time series that were affected at the
# identified anomalous time instant 't'
Args:
    t: Anomalous time instant
    top_k: Number of highest ranked time series to display

Example

We use CDN working set size data to illustrate this multivariate anomaly detection approach below:

import pandas as pd
from infrastrategy.kats.consts import TimeSeriesData
from infrastrategy.kats.models.var import VARParams
from infrastrategy.kats.detectors.outlierDetection import (
 MultivariateAnomalyDetector
)

# read data and convert to TimeSeriesData structure
DATA_multi = pd.read_csv("../data/`cdn_working_set`.csv")
TSData_multi = TimeSeriesData(DATA_multi)

# select parameters to use for VAR modeling
params = VARParams(maxlags=3)

# detect anomalies in a rolling fashion
d = MultivariateAnomalyDetector(TSData_multi,
 params,
 training_days=3)
anomaly_score_df = d.detector()

# choose a threshold based on plot of anomaly scores for various anomalies
d.plot()

# get time instants for identified anomalous events
threshold = 40
anomalies = d.get_anomaly_timepoints(threshold)

# get top 5 anomalous metric during one of these instants
d.get_anomalous_metrics(anomalies[0], top_k=5)

Output of plot:

← TsFeaturesVAR →
  • API:
    • Parameters:
    • Methods
  • Example
Kats Project
More
GitHubStar
Facebook Open Source
Copyright © 2021 Kats Project @ Facebook