BOCPD: Bayesian Online Changepoint Detection
Bayesian online changepoint detection is a method for detecting changes in a time series. The detector tries to detect sudden changes in a time series that persist over a period of time. Compared to other changepoint detection methods, this method has some unique features:
- This is an online model. As new data arrives, it revises its predictions. It only needs to look at few steps ahead(specified by the user) to detect, and does not need the entire time series apriori.
- Since this is a Bayesian model, the user can specify their prior belief about the probability of a changepoint, as well as the parameters of the underlying model governing the time series.
This faithfully implements the algorithm in Adams & McKay, 2007. "Bayesian Online Changepoint Detection" (https://arxiv.org/abs/0710.3742).
The basic idea is to see whether the new values are improbable, when compared to a bayesian predictive model, built from the previous observations.
There are two different classes the user needs to specify. The first is a changepoint detection class, and the second one is the underlying predictive model(UPM). The UPM specifies the generative model, from which successive points in the time series are generated from.
For the changepoint detection model, the user needs to specify:
- The data, which is an univariate time series
- the lag, which specifies how many steps to look ahead to find the changepoint
For the underlying predictive model, currently we only support a Normal distribution with unknown mean, known variance. We will add more distributions in the future. For the UPM, the user needs to specify:
- whether to use empirical prior, derived from the data
- If not using empirical prior, user needs to specify the mean and precision of the prior distribution of the mean, as well the known precision. The model is formulated in terms of precision (which is the inverse of variance) for convenience
API
class BayesOnlineChangePoint(data=ts_df, lag=10, debug=True)
class NormalKnownPrec(data=ts_df, empirical=True)
Methods
# to run the detector
detector(model=this_model, changepoint_prior=0.01, threshold=0.4)
# to plot the results
plot()
# to adjust the parameters and plot again
adjust_parameters(threshold=0.2, lag=10)
# The user does not need to query the UPM
# However, these the are public methods
`pred_prob``(`` t``:`` ``int``,`` x``:`` ``float``)`
`pred_mean``(``self``,`` t``:`` ``int``,`` x``:`` ``float``)`
`pred_var``(``self``,`` t``:`` ``int``,`` x``:`` ``float``)``
update_sufficient_stats(self, x: float)
`
`t ``:`` time`
`x``:`` value of the time series at time t`
Parameters
# Constructor
class BayesOnlineChangePoint(data=ts_df, lag=10, debug=True)
data: univariate time series of class TimeSeriesData
lag: positive integer. How many steps do we want to look ahead
debug: boolean If true, we will generate additional plots showing
probability of the next data point, as well as mean and variance
of the predictive model
# detector
detector(model=this_model, changepoint_prior=0.01, threshold=0.4)
model: This is the UPM. It is now an object of class NormalKnownPrec
changepoint_prior: float between 0 and 1.
prior belief on probability of observing changepoint
threshold: float between 0 and 1.
Algorithm calculates posterior probability of a changepoint
at each point. If probability is above this specified threshold,
we make that point a changepoint
# adjust parameters
adjust_parameters(threshold=0.2, lag=10)
threshold: same as defined above
lag: same as defined above for detector
Output
Output is a dict with two keys
'change_probs': An array Change probabilities at each point
'change_points': An array with index of each of the changepoints
Minimal Example
# import stuff and simulate data
from infrastrategy.kats.consts import TimeSeriesData
from infrastrategy.kats.detectors.ChangePointDetection import (BayesOnlineChangePoint,NormalKnownPrec)
# make some time series data with changepoints
def make_ts():
np.random.seed(seed=100)# constants
sigma = 1
t_start = 0
t_end = 450#calculation
num_points = t_end - t_start
y_val = norm.rvs(loc=1.35, scale=0.05, size=num_points)# make changepoints
y_val[100:200] = y_val[100:200] - 0.2
y_val[350:450] = y_val[350:450] - 0.15
df = pd.DataFrame({'time': list(range(t_start, t_end)), 'value': y_val })
return df
# standard function call
this_model = NormalKnownPrec(data=ts_df, empirical=True)
change_point = BayesOnlineChangePoint(data=ts_df, lag=10)
output = change_point.detector(model=this_model, changepoint_prior=0.01, threshold=0.4)
change_point.plot()
# more advanced, call with debug, adjust parameters
this_model = NormalKnownPrec(data=ts_df, empirical=True)
change_point = BayesOnlineChangePoint(data=ts_df, lag=10, debug=True)
output = change_point.detector(model=this_model, changepoint_prior=0.01, threshold=0.4)
change_point.plot()
cp_output = change_point.adjust_parameters(threshold=0.2, lag=10)