skclean.models.RobustForest

class skclean.models.RobustForest(method='simple', K=5, n_estimators=100, max_leaf_nodes=128, random_state=None, n_jobs=None)

Uses a random forest to to compute pairwise similarity/distance, and then a simple K Nearest Neighbor that works on that similarity matrix. For a pair of samples, the similarity value is proportional to how frequently they belong to the same leaf. See [LM17] for details.

Parameters
  • method (string, default='simple') – There are two different ways to compute similarity matrix. In ‘simple’ method, the similarity value is simply the percentage of times two samples belong to same leaf. ‘weighted’ method also takes the size of those leaves into account- it exactly matches above paper’s algorithm, but it is computationally slow.

  • K (int, default=5) – No of nearest neighbors to consider for final classification

  • n_estimators (int, default=101) – No of trees in Random Forest.

  • max_leaf_nodes (int, default=128) – Maximum no of leaves in each tree.

  • n_jobs (int, default=1) – No of parallel cpu cores to use

  • random_state (int, default=None) – Set this value for reproducibility

Methods

__init__([method, K, n_estimators, …])

Initialize self.

fit(X, y)

get_params([deep])

Get parameters for this estimator.

pairwise_distance(train_X, test_X)

predict(X)

score(X, y[, sample_weight])

Return the mean accuracy on the given test data and labels.

set_params(**params)

Set the parameters of this estimator.