polyfemos.data.outlierremover

A function collection to remove outliers from the data

With default values using (242820 x 2) data set function execution (1 call) times were

STALTA

1.43 s

100%

DTR

1.19 s

84%

Lipschitz

0.58 s

41%

copyright

2019, University of Oulu, Sodankyla Geophysical Observatory

license

GNU Lesser General Public License v3.0 or later (https://spdx.org/licenses/LGPL-3.0-or-later.html)

Public Functions

polyfemos.data.outlierremover.dtr(data, maxdepth=0, scale=24000, medlim=10, **kwargs)[source]

A function to remove outliers using Decision Tree.

The given data is approximated using DecisionTreeRegressor decision tree. The median of the error between the data and the approximation is calculated. If the error between a datapoint and an approximated value is greater than medlim times the median, the datapoint is excluded.

scale is used to select maxdepth according to the datalen N. If N > scale, maxdepth = 2. If N > 10 * scale, maxdepth = 4, and so forth. If maxdepth is given, scale is ignored.

Parameters
  • data (ndarray) – x-y data in Nx2 array, shape (N, 2)

  • maxdepth (int) – The maximum depth of the tree.

  • scale (float) –

  • medlim (float) –

Return type

ndarray

Returns

mask array containing bool values

polyfemos.data.outlierremover.lipschitz(data, itern=1, klim=7e-06, **kwargs)[source]

A function to remove outliers based on Lipschitz continuity. Calculates the change (slope, K) in y=f(x) function between two datapoints.

K = |f(x1) - f(x0)| / |x1 - x0|

Datapoints which cause a slope too steep, are removed.

Parameters
  • data (ndarray) – x-y data in Nx2 array, shape (N, 2)

  • itern (int) – The maximum interval between the datapoints x0 and x1 Complexity = N * itern

  • klim (float) – the maximum slope allowed

Return type

ndarray

Returns

mask array containing bool values

polyfemos.data.outlierremover.stalta(data, nsta=3, nlta=10, threson=1.08, thresoff=1.05, offset=40, **kwargs)[source]

Utilises classic_sta_lta() to remove outliers

Parameters
  • data (ndarray) – x-y data in Nx2 array, shape (N, 2)

  • nsta (int) – Length of short time average window in samples

  • nlta (int) – Length of long time average window in samples

  • threson (float) – Value above which trigger (of characteristic function) is activated (higher threshold)

  • thresoff (float) – Value below which trigger (of characteristic function) is deactivated (lower threshold)

  • offset (int) – in samples, how many additional samples are removed before on trigger and after off trigger

Return type

ndarray

Returns

mask array containing bool values

Private Functions

polyfemos.data.outlierremover._get_mask(b, N, indices, nanindices=[])[source]

Helper function to form masks

Parameters
Return type

ndarray

Returns

mask array containing bool values