• API

›Utilities

Forecasting

  • Autoregressive Neural Network (AR_net)
  • Quadratic Model
  • Linear Model
  • KatsEnsemble
  • Empirical Confidence Interval
  • STLF
  • Theta
  • Holt-Winter’s
  • Prophet
  • SARIMA
  • ARIMA

Detection

  • BOCPD: Residual Translation
  • BOCPD: Bayesian Online Changepoint Detection
  • Outlier Detection
  • ACFDetector
  • Seasonality Detector
  • Cusum Detector

TSFeatures

  • TsFeatures

Multivariate

  • Multivariate Outlier Detection
  • VAR

Utilities

  • Model Hyperparameter Tuning
  • Backtesting
  • Time Series Decomposition
  • Dataswarm Operators

Model Hyperparameter Tuning

In statistical modeling or machine learning, **hyperparameter tuning **or hyperparameter optimization is the process to choose a set of optimal hyperparameters for the model or the ML algorithm. A hyperparameter is a parameter whose value can control and affect the modeling process. This is in contrast to the **parameters of interest **or **model parameters **which are typically feature/predictor/independent variable weights and to be learned through the modeling. Hyperparameters differ by types and values across various models or ML algorithms and we perform hyperparameter tuning to achieve optimal model generalizability — model performance on hold-out data sets which are not used for model training.

Hyperparemeter tuning is achieved by adopting the optimal set of hyperparameters through performance evaluation on a hold-out data set — validation set. Often we split the validation set into separate splits and select the optimal set of hyperparameters base on the aggregate performance over those splits. One key component of hyperparameter tuning is the how we search for the optimal parameter, we have used three approaches in our modules:

  • Grid search: search the optimal parameter over a predefined grid of parameter spaces — a Cartesian product of all possible parameter combinations from defined parameter values
  • Random search: search the optimal parameter over random samples from defined parameter ranges or values
  • Bayesian optimal search(to be implemented): search the optimal parameter through a Bayesian iterative optimization process.

The general ideas of grid search and random search are demonstrated through a toy evaluation function, for Bayesian optimization we just give the general illustration and will update actual examples once implemented. A nice summary for Bayesian optimization can be found in the link, on the high level this approach is trying to find the optimal parameter by performing en efficient parameter search during each iteration to achieve the maximum gain on the objective function, where the information of the objective function is retrieved from the posterior distribution by adopting a Bayesian framework. The prior often used is Gaussian process (GP) and during each iteration the optimal parameter is chosen based on the acquisition function, one commonly used is the expected improvement:

where f(x) is the unknown objective function to be maximized(minimization would just be the opposite) and f* is the best observation in the current iteration. The idea can be simply illustrated from the following animation:

API

Classes and methods:

class TimeSeriesParameterTuning(ABC)

# methods
validate_parameters_format(parameters): # validate parameter input types and values
get_search_space(): # return parameter search space
generator_run_for_search_method(evaluation_function, generator_run): # run and return results of the experiment trials using the evaluation function and the run generator
generate_evaluate_new_parameter_values(evaluation_function, arm_count): # abstract wrapper method to kick off experiment runs
list_parameter_value_scores(): # return parameter tuning results dataframe

# Search objects factory metaclass
class SearchMethodFactory(metaclass=Final)
# create search method class object based on the selected_search_method, currently we have supported three search methods: "GRID_SEARCH", "RANDOM_SEARCH_UNIFORM", "RANDOM_SEARCH_SOBOL", and "BAYES_OPT"
create_search_method(parameters, selected_search_method, experiment_name, objective_name):

class GridSearch(TimeSeriesParameterTuning)
# return parameter tuning results dataframe from a grid search run generator repeated by "arm_count" number of times
generate_evaluate_new_parameter_values(evaluation_function, arm_count):

class RandomSearch(TimeSeriesParameterTuning)
# return parameter tuning results dataframe from a random search run generator repeated by "arm_count" number of times, with class varialbe "random_strategy" defining the actual random search strategy
generate_evaluate_new_parameter_values(evaluation_function, arm_count):

class BayesianOptSearch(TimeSeriesParameterTuning)
# return parameter tuning results dataframe from a grid search run generator repeated by "arm_count" number of times
generate_evaluate_new_parameter_values(evaluation_function, arm_count):

class SearchForMultipleSpaces
# Wrapper method for the above three hypterparameter tuning classes with specific input model class parameter
generate_evaluate_new_parameter_values(selected_model, evaluation_function, arm_count)

Example


import numpy as np
import pandas as pd
import infrastrategy.kats.parameter_tuning.time_series_parameter_tuning as tpt

from ax.core.parameter import ChoiceParameter, FixedParameter, ParameterType
from ax.models.random.sobol import SobolGenerator
from ax.models.random.uniform import UniformGenerator
from infrastrategy.kats.consts import ModelEnum, SearchMethodEnum, TimeSeriesData
from infrastrategy.kats.models.arima import ARIMAModel, ARIMAParams
from infrastrategy.kats.models.prophet import ProphetModel, ProphetParams

# Define a random state

seed=123
random_state = np.random.RandomState(seed=seed)

# A toy evaluation function for illustration purpose only

def toy_evaluation(params):
error = random_state.random()
sem = 0.0 # standard error of the mean of model's estimation error.
return error, sem

pd.set_option('display.max_colwidth', -1)

Grid search


parameters_grid_search = [
{
"name": "p",
"type": "choice",
"values": list(range(1, 3)),
"value_type": "int",
"is_ordered": True,
},
{
"name": "d",
"type": "choice",
"values": list(range(1, 2)),
"value_type": "int",
"is_ordered": True,
},
{
"name": "q",
"type": "choice",
"values": list(range(1, 3)),
"value_type": "int",
"is_ordered": True,
},
]

# Create search method object

parameter_tuner_grid = tpt.SearchMethodFactory.create_search_method(
objective_name="toy_metric",
parameters=parameters_grid_search,
selected_search_method=SearchMethodEnum.GRID_SEARCH,
)

# Kick off a parameter tuning trial, "arm_count=-1" implies to run all combinations in the grid

parameter_tuner_grid.generate_evaluate_new_parameter_values(
evaluation_function=toy_evaluation, arm_count=-1
)

# Retrieve parameter tuning results

parameter_tuning_results_grid = (
parameter_tuner_grid.list_parameter_value_scores()
)

parameter_tuning_results_grid

Random search


parameters_random_search = [
{
"name": "seasonality_prior_scale",
"type": "choice",
"value_type": "float",
"values": list(np.logspace(-2, 1, 10, endpoint=True)),
"is_ordered": True,
},
{
"name": "yearly_seasonality",
"type": "choice",
"value_type": "bool",
"values": [True, False],
},
{
"name": "seasonality_mode",
"type": "choice",
"value_type": "str",
"values": ["additive", "multiplicative"],
},
{
"name": "changepoint_prior_scale",
"type": "range",
"value_type": "float",
"values": [list(np.logspace(-3, 0, 10, endpoint=True))],
"is_ordered": True,
},
{
"name": "changepoint_range",
"type": "range",
"value_type": "float",
"bounds": [0.5, 0.8],
"is_ordered": True,
},
]

# Create search method object

parameter_tuner_random = tpt.SearchMethodFactory.create_search_method(
objective_name="toy_metric",
parameters=parameters_random_search,
selected_search_method=SearchMethodEnum.RANDOM_SEARCH_UNIFORM,
)

# Kick off parameter tuning trials, total number of random combinations will be num_trials\*num_arms

num*trials=3
num_arms=4
for * in range(num_trials):
parameter_tuner_random.generate_evaluate_new_parameter_values(
evaluation_function=toy_evaluation, num_arms=num_arms
)

# Retrieve parameter tuning results

parameter_tuning_results_random = (
parameter_tuner_random.list_parameter_value_scores()
)
parameter_tuning_results_random

Model Hyperparameter Tuning

In statistical modeling or machine learning, **hyperparameter tuning **or hyperparameter optimization is the process to choose a set of optimal hyperparameters for the model or the ML algorithm. A hyperparameter is a parameter whose value can control and affect the modeling process. This is in contrast to the **parameters of interest **or **model parameters **which are typically feature/predictor/independent variable weights and to be learned through the modeling. Hyperparameters differ by types and values across various models or ML algorithms and we perform hyperparameter tuning to achieve optimal model generalizability — model performance on hold-out data sets which are not used for model training.

Hyperparemeter tuning is achieved by adopting the optimal set of hyperparameters through performance evaluation on a hold-out data set — validation set. Often we split the validation set into separate splits and select the optimal set of hyperparameters base on the aggregate performance over those splits. One key component of hyperparameter tuning is the how we search for the optimal parameter, we have used three approaches in our modules:

  • Grid search: search the optimal parameter over a predefined grid of parameter spaces — a Cartesian product of all possible parameter combinations from defined parameter values
  • Random search: search the optimal parameter over random samples from defined parameter ranges or values
  • Bayesian optimal search(to be implemented): search the optimal parameter through a Bayesian iterative optimization process.

The general ideas of grid search and random search are demonstrated through a toy evaluation function, for Bayesian optimization we just give the general illustration and will update actual examples once implemented. A nice summary for Bayesian optimization can be found in the link, on the high level this approach is trying to find the optimal parameter by performing en efficient parameter search during each iteration to achieve the maximum gain on the objective function, where the information of the objective function is retrieved from the posterior distribution by adopting a Bayesian framework. The prior often used is Gaussian process (GP) and during each iteration the optimal parameter is chosen based on the acquisition function, one commonly used is the expected improvement:

[Image: Screen Shot 2020-05-06 at 12.17.01 PM.png] where f(x) is the unknown objective function to be maximized(minimization would just be the opposite) and f* is the best observation in the current iteration. The idea can be simply illustrated from the following animation: [Image: bo_1d_opt.gif]

API

Classes and methods:

class `TimeSeriesParameterTuning`(ABC)

# methods
`validate_parameters_format``(``parameters``):`` ``# validate parameter input types and values`
`get_search_space``():`` ``# return parameter search space `
`generator_run_for_search_method``(``evaluation_function``,`` generator_run``):`` ``# run and return results of the experiment trials using the evaluation function and the run generator `
`generate_evaluate_new_parameter_values``(``evaluation_function``,`` arm_count``): # abstract wrapper method to kick off experiment runs`` `
`list_parameter_value_scores``():`` # return parameter tuning results dataframe

`
# Search objects factory metaclass
class SearchMethodFactory(metaclass=Final)
# create search method class object based on the selected_search_method, currently we have supported three search methods: "GRID_SEARCH", "RANDOM_SEARCH_UNIFORM", "RANDOM_SEARCH_SOBOL", and "BAYES_OPT"
create_search_method(parameters, selected_search_method, experiment_name, objective_name):

class GridSearch(TimeSeriesParameterTuning)
# return parameter tuning results dataframe from a grid search run generator repeated by "arm_count" number of times
generate_evaluate_new_parameter_values(evaluation_function, arm_count):

class RandomSearch(TimeSeriesParameterTuning)
# return parameter tuning results dataframe from a random search run generator repeated by "arm_count" number of times, with class varialbe "random_strategy" defining the actual random search strategy
generate_evaluate_new_parameter_values(evaluation_function, arm_count):

class BayesianOptSearch(TimeSeriesParameterTuning)
# return parameter tuning results dataframe from a grid search run generator repeated by "arm_count" number of times
generate_evaluate_new_parameter_values(evaluation_function, arm_count):

class `SearchForMultipleSpaces`
`# Wrapper method for the above three hypterparameter tuning classes with specific input model class parameter  `
`generate_evaluate_new_parameter_values``(``selected_model``,`` evaluation_function``,`` arm_count``)`

Example

import numpy as np
import pandas as pd
import infrastrategy.kats.parameter_tuning.time_series_parameter_tuning as tpt

from ax.core.parameter import ChoiceParameter, FixedParameter, ParameterType
from ax.models.random.sobol import SobolGenerator
from ax.models.random.uniform import UniformGenerator
from infrastrategy.kats.consts import ModelEnum, SearchMethodEnum, TimeSeriesData
from infrastrategy.kats.models.arima import ARIMAModel, ARIMAParams
from infrastrategy.kats.models.prophet import ProphetModel, ProphetParams

# Define a random state
seed=123
random_state = np.random.RandomState(seed=seed)
# A toy evaluation function for illustration purpose only
def toy_evaluation(params):
        error = random_state.random()
        sem = 0.0  # standard error of the mean of model's estimation error.
        return error, sem

pd.set_option('display.max_colwidth', -1)

Grid search

parameters_grid_search = [
            {
                "name": "p",
                "type": "choice",
                "values": list(range(1, 3)),
                "value_type": "int",
                "is_ordered": True,
            },
            {
                "name": "d",
                "type": "choice",
                "values": list(range(1, 3)),
                "value_type": "int",
                "is_ordered": True,
            },
            {
                "name": "q",
                "type": "choice",
                "values": list(range(1, 3)),
                "value_type": "int",
                "is_ordered": True,
            },
        ]

# Create search method object
parameter_tuner_grid = tpt.SearchMethodFactory.create_search_method(
            objective_name="toy_metric",
            parameters=parameters_grid_search,
            selected_search_method=SearchMethodEnum.GRID_SEARCH,
        )
# Kick off a parameter tuning trial,  "arm_count=-1" implies to run all combinations in the grid
parameter_tuner_grid.generate_evaluate_new_parameter_values(
            evaluation_function=toy_evaluation, arm_count=-1
        )
# Retrieve parameter tuning results
parameter_tuning_results_grid = (
            parameter_tuner_grid.list_parameter_value_scores()
        )

parameter_tuning_results_grid

[Image: Screen Shot 2020-07-23 at 10.39.57 AM.png]

Random search

parameters_random_search = [
            {
                "name": "seasonality_prior_scale",
                "type": "choice",
                "value_type": "float",
                "values": list(np.logspace(-2, 1, 10, endpoint=True)),
                "is_ordered": True,
            },
            {
                "name": "yearly_seasonality",
                "type": "choice",
                "value_type": "bool",
                "values": [True, False],
            },
            {
                "name": "seasonality_mode",
                "type": "choice",
                "value_type": "str",
                "values": ["additive", "multiplicative"],
            },
            {
                "name": "changepoint_prior_scale",
                "type": "choice",
                "value_type": "float",
                "values": list(np.logspace(-3, 0, 10, endpoint=True)),
                "is_ordered": True,
            },
            {
                "name": "changepoint_range",
                "type": "range",
                "value_type": "float",
                "bounds": [0.5, 0.8],
                "is_ordered": True,
            },
        ]

# Create search method object
parameter_tuner_random = tpt.SearchMethodFactory.create_search_method(
    objective_name="toy_metric",
    parameters=parameters_random_search,
    selected_search_method=SearchMethodEnum.RANDOM_SEARCH_UNIFORM,
)
# Kick off parameter tuning trials, total number of random combinations will be num_trials*num_arms
num_trials=3
num_arms=4
for _ in range(num_trials):
    parameter_tuner_random.generate_evaluate_new_parameter_values(
        evaluation_function=toy_evaluation, arm_count=num_arms
    )
# Retrieve parameter tuning results
parameter_tuning_results_random = (
    parameter_tuner_random.list_parameter_value_scores()
)
parameter_tuning_results_random

[Image: Screen Shot 2020-07-23 at 10.40.41 AM.png]

Backtesting →
  • API
    • Classes and methods:
  • Example
    • Grid search
    • Random search
  • API
    • Classes and methods:
  • Example
    • Grid search
    • Random search
Kats Project
More
GitHubStar
Facebook Open Source
Copyright © 2021 Kats Project @ Facebook