Empirical Confidence Interval
Producing high quality prediction interval (or uncertainty interval) is very challenging, different models have various approaches to quantify the uncertainty of the forecasts. We implemented a unified framework to calculate the uncertainty interval for any forecasting models in an empirical approach. The procedure is described as follows
- Run K-fold cross validation for a given model and data, each fold contains h (horizon) time ahead
- For each horizon, calculate the Std of K error terms (S.E)
- Fit linear model: S.E. ~ Horizon
- Estimate the S.E. for each horizon for the true future
- Lower/Upper = Point Estimate -/+ Z_Score * S.E.
API
class EmpConfidenceInt(error_methods, data, params, train_percentage, test_percentage,
sliding_steps, model_class, multi, confidence_level)
Parameters
error_methods: list of error methods from ["mape", "smape", "mae", "mase", "mse", "rmse"]
data: input data with TimeSeriesData format
params: model parameters
train_percentage: percentage for training data set
test_percentage: percentage for testing data set
sliding_steps: steps for a moving sliding window
model_class: model class
multi: flag to use multiprocessing, default as True
confidence_level: confidence level, default as 0.8
Methods
We provide three useful methods:
get_eci
: calculate the empirical confidence interval and return the forecasted series with lower and upper bounds, as specifiedsteps
andfreq
diagnose
: our method depends on the fit of S.E. ~ Horizon, so it's useful to check the linear model fit plot via callci.diagnose()
plot
: plot the forecasted series with lower and upper bounds, if the default confidence/uncertainty interval exists, plot both
Example
We demo the usage of empirical confidence interval with air passenger data on prophet model
from infrastrategy.kats.utils.empConfidenceInt import EmpConfidenceInt
from infrastrategy.kats.consts import TimeSeriesData
from infrastrategy.kats.models.prophet import ProphetParams, ProphetModel
import pandas as pd
DATA = pd.read_csv("../data/example_air_passengers.csv")
DATA.columns = ["time", "y"]
TSData = TimeSeriesData(DATA)
params = ProphetParams(seasonality_mode="multiplicative")
ALL_ERRORS = ["mape", "smape", "mae", "mase", "mse", "rmse"]
# define the empConfidenceInt obj with required params
ci = EmpConfidenceInt(
ALL_ERRORS,
TSData,
params,
50,
25,
12.5,
ProphetModel,
confidence_level=0.9
)
# get eci, return a dataframe with calculated
eci = ci.get_eci(steps=100, freq="MS")
# let's diagonise the linear model fitting
ci.diagnose()
# check out the comparison between empirical confidence interval v.s. default ones
ci.plot()