• API

›Utilities

Forecasting

  • Autoregressive Neural Network (AR_net)
  • Quadratic Model
  • Linear Model
  • KatsEnsemble
  • Empirical Confidence Interval
  • STLF
  • Theta
  • Holt-Winter’s
  • Prophet
  • SARIMA
  • ARIMA

Detection

  • BOCPD: Residual Translation
  • BOCPD: Bayesian Online Changepoint Detection
  • Outlier Detection
  • ACFDetector
  • Seasonality Detector
  • Cusum Detector

TSFeatures

  • TsFeatures

Multivariate

  • Multivariate Outlier Detection
  • VAR

Utilities

  • Model Hyperparameter Tuning
  • Backtesting
  • Time Series Decomposition
  • Dataswarm Operators

Dataswarm Operators

CusumOperator

This Dataswarm operator performs CUSUM detection.

Steps:

  1. Operator grabs data from existing Hive Table using history_query. Query must retrieve two columns, one called time with the timestamps and the other called value or y with the variable values.
  2. Operator performs CUSUM changepoint detection on data. Default params are assumed, custom ones can be supplied. See regressionDetection.CusumDetector for more details.
  3. Operator uploads data to table specified by output_table. Each row in the output table corresponds to a changepoint. The column headers are the following: changepoint_found, direction, mu0, mu1, changetime, stable_changepoint, delta, llr_int, llr, p_value, regression_detected. For more details on the columns, see regressionDetection.CusumDetector.

API:

# Abstract Parent Class
`class`` ``CusumOperator``(``BashOperator``):`
        self,
        user,
        schedule,
        dep_list,
        owner,
        history_query,
        history_namespace,
        output_table,
        output_namespace,
        ds_partition="<DATEID>",
        retention=90,
        datetime_format="%Y-%m-%d",
        cusum_params=None,
    ):

Parameters (in addition to parent class BashOperator):

history_query: str. SQL query to pull data on which to perform CUSUM detection.
               Must retrieve only two columns named time and value.
history_namespace: str. Namespace for history_query
output_table: str. Output table to write results to
output_namespace: str. Namespace for output_table
ds_partition (optional): str. `ds partition associated ``with`` the uploaded output``.`
                         Default "<DATEID>"
retention (optional): int. How long output data will be retained for. Default 90.
datetime_format (optional). str. Datetime format for output data. Default "%Y-%m-%d"
cusum_params (optional). dict. Custom params for CUSUM detection. Default None.

Example:

#!/usr/bin/env python3

from dataswarm.operators import GlobalDefaults
from dataswarm_extension_kats.cusumoperator import CusumOperator


GlobalDefaults.set(
    user="rohanfb",
    oncall="kats_dev",
    secure_group="kats",
    schedule="@never",
    partition="ds=<DATEID>",
    depends_on_past=True,
    num_retries=3,
    task_tags=["python-version-3"],
)

history_query = """
SELECT
    *
FROM test_cusum_kats_dev
"""
history_namespace = "di"

cusum_params = {
    "threshold": 0.01,
    "max_iter": 10,
    "delta_std_ratio": 1.0,
    "min_abs_change": 0,
    "start_point": None,
    "change_directions": None,
    "interest_window": [0, 100],
    "magnitude_quantile": None,
    "magnitude_ratio": 1.3,
    "magnitude_comparable_day": 0.5,
}

wait_cusum_detector = CusumOperator(
    dep_list=[],
    history_query=history_query,
    history_namespace=history_namespace,
    output_table="test_cusum_operator_dev",
    output_namespace="di",
    retention=30,
    cusum_params=cusum_params,
    owner="kats",
)

Output Table Result:

changepoint_founddirectionchangepointmu0mu1changetimestable_changepointdeltallr_intllrp_valueregression_detected
TRUEincrease64175.23077366.746845/1/2054TRUE191.51607100.33933145.32540TRUE
TRUEdecrease98217.05051419.444443/1/2057FALSE202.39394-106.11898137.727710FALSE
← Time Series Decomposition
  • CusumOperator
    • API:
    • Parameters (in addition to parent class BashOperator):
    • Example:
Kats Project
More
GitHubStar
Facebook Open Source
Copyright © 2021 Kats Project @ Facebook