BOCPD: Residual Translation
Detectors based on predictors, basically work as follows: calculate the residual (i.e., difference between predicted and current value), and translate it into a false-alarm probability by how large it is. This is often done by assuming that residuals are distributed normally. In practice, the residuals are often non-normal (sometimes even being asymmetric). This module “learns” the distribution of the residual (using kernel density estimation), and outputs a false-alarm probability based on it.
API
class KDEResidualTranslator(
ignore_below_frac: float = 0,
ignore_above_frac: float = 1,
)
"""
Translates residuals (difference between outcome and prediction)
to false-alarm probability using kernel density estimation
on the residuals.
"""
def __init__(
self,
ignore_below_frac: float = 0,
ignore_above_frac: float = 1,
) -> None:
if ignore_below_frac < 0 or ignore_above_frac > 1:
raise ValueError("Illegal ignore fractions")
if ignore_below_frac > ignore_above_frac:
raise ValueError("Illegal ignore fractions")
self._ignore_below_frac = ignore_below_frac
self._ignore_above_frac = ignore_above_frac
Parameters
ignore_below_frac: Lower quantile to ignore during training
(makes the translator more robust to outliers); default 0.
ignore_above_frac: Upper quantile to ignore during training
(makes the translator more robust to outliers); default 1.
Methods
def fit(
self,
y: Optional[TimeSeriesData] = None,
yhat: Optional[TimeSeriesData] = None,
yhat_lower: Optional[TimeSeriesData] = None,
yhat_upper: Optional[TimeSeriesData] = None,
residual: Optional[TimeSeriesData] = None,
) -> "KDEResidualTranslator":
"""
Fits a dataframe to a model of the residuals.
Arguments:
df: A pandas DataFrame containg the following columns:
1. Either
a. `residual`, or
b. `y` and `yhat` with optionally both `yhat_lower` and `yhat_upper`
2. At most one of `ds` and `ts`
"""
def predict_log_proba(self,
y: Optional[TimeSeriesData] = None,
yhat: Optional[TimeSeriesData] = None,
yhat_lower: Optional[TimeSeriesData] = None,
yhat_upper: Optional[TimeSeriesData] = None,
residual: Optional[TimeSeriesData] = None,
) -> TimeSeriesData:
"""
Predicts the natural-log probability of a residual
Arguments:
df: A pandas DataFrame containg the following columns:
1. Either
a. `residual`, or
b. `y` and `yhat` with optionally both `yhat_lower` and `yhat_upper`
2. At most one of `ds` and `ts`
Returns:
A series where there is a probability corresponding to
each instance (row) in the input.
"""
@property
def kde_(self) -> KernelDensity:
"""
Returns:
KernelDensity object fitted to the residuals.
"""
Example
We use the classical Peyton Manning data for demo.
import pandas as pd
from infrastrategy.kats.detectors.seasonalityDetection import SeasonalityDetector
DATA = pd.read_csv("../data/example_wp_log_peyton_manning.csv")
DATA.columns = ["time", "y"]
DATA = DATA[(DATA.time > '2012-05-01') & (DATA.time < '2013-05-01')]
TSData = TimeSeriesData(DATA)
SD = SeasonalityDetector(data=TSData)
SD.detector()
# return {'seasonality_presence': True, 'seasonalities': ['weekly']}