KatsEnsemble
Ensemble is a learning process to leverage multiple learning algorithms to obtain a better predictive results than any of those multiple algorithms alone. We applied this process to time series forecasting domain and further proposed two types of ensemble learning algorithms: median ensemble and weighted average ensemble.
We first show the procedure of our ensemble in the following digram, the high level procedures are summarized as follows. After the basic data cleaning, adjustment, etc., we detect the existence of seasonalities, the downstream operations depends on the results from seasonality detector.
- If there is no seasonality, we’re safely to fit those non-seasonal capable models and proceed to aggregation (please note: we can also use the seasonal-capable models as well because most of the seasonal-capable model can fit non-seasonal data naturally).
- If certain seasonality is detected, we will apply the STL decomposition to decompose the original time series data to two components: (a). seasonal, and (b). de-seasonalized. We assume the seasonal component remains the same for the future while modeling the de-seasoanlized component with non-seasonal-capable models, then we sum up those two components to get the final forecasts. In the meanwhile, we know some models are capable to modeling the time series data with seasonalities, we can safely fit those models with the original data. We then proceed to the aggregation step to produce the final results.
- We currently support two types of ensemble/aggregation
- Pick up the median value from each timestamp as the final forecasts
- Back test each individual model and get certain performance metric such as MAPE, we then calculate the weight as proportional to 1/performance, i.e., 1/MAPE.
API
# model class
class KatsEnsemble(data, params)
Parameters
data : TimeSeriesData
params: Dict with following keys
models: EnsembleParams contains individual model params
i.e., [BaseModelParams, ...]
aggregation: we support median ("median") and weighted average ("weightedavg")
fitExecutor: callable executor to fit individual models
forecastExecutor: callable executor to fit and predict individual models
seasonality_length : the length of seasonality -> TODO: auto determine
decomposition_method : type of decomposition, we support "additive", and "multiplicative"
Please note, both fitExecutor and forecastExecutor are optional, only services which want to improve the computational performance need to implement on their own.
Methods
fit : fit individual models by calling fitExecutor
predict : predict the future time series values by a given steps
forecast : combination of fit and predict methods
aggregate: ensemble individual predictions to final forecast
plot : plot the historical and predicted values
Example
from infrastrategy.kats.consts import TimeSeriesData
from infrastrategy.kats.models.ensemble import EnsembleParams, BaseModelParams
from infrastrategy.kats.models.KatsEnsemble import KatsEnsemble
from infrastrategy.kats.models import (
arima,
holtwinters,
linearModel,
prophet,
quadraticModel,
sarima,
theta,
)
import pandas as pd
DATA = pd.read_csv("../data/example_air_passengers.csv")
DATA.columns = ["time", "y"]
TSData = TimeSeriesData(DATA)
model_params = EnsembleParams(
[
BaseModelParams("arima", arima.ARIMAParams(p=1, d=1, q=1)),
BaseModelParams(
"sarima",
sarima.SARIMAParams(
p=2,
d=1,
q=1,
trend="ct",
seasonal_order=(1, 0, 1, 12),
enforce_invertibility=False,
enforce_stationarity=False,
),
),
BaseModelParams("prophet", prophet.ProphetParams()),
BaseModelParams("linear", linearModel.LinearModelParams()),
BaseModelParams("quadratic", quadraticModel.QuadraticModelParams()),
BaseModelParams("theta", theta.ThetaParams(m=12)),
]
)
KatsEnsembleParam = {
"models": model_params,
"aggregation": "median",
"seasonality_length": 12,
"decomposition_method": "multiplicative",
}
m = KatsEnsemble(data=TSData, params=KatsEnsembleParam)
m.fit()
m.predict(steps=30)
m.aggregate()