polyfemos.data.outlierremover¶
A function collection to remove outliers from the data
With default values using (242820 x 2) data set function execution (1 call) times were
STALTA |
1.43 s |
100% |
DTR |
1.19 s |
84% |
Lipschitz |
0.58 s |
41% |
- copyright
2019, University of Oulu, Sodankyla Geophysical Observatory
- license
GNU Lesser General Public License v3.0 or later (https://spdx.org/licenses/LGPL-3.0-or-later.html)
Public Functions
-
polyfemos.data.outlierremover.
dtr
(data, maxdepth=0, scale=24000, medlim=10, **kwargs)[source]¶ A function to remove outliers using Decision Tree.
The given
data
is approximated usingDecisionTreeRegressor
decision tree. The median of the error between the data and the approximation is calculated. If the error between a datapoint and an approximated value is greater thanmedlim
times the median, the datapoint is excluded.scale
is used to selectmaxdepth
according to the datalen N. If N >scale
,maxdepth
= 2. If N > 10 *scale
,maxdepth
= 4, and so forth. Ifmaxdepth
is given,scale
is ignored.
-
polyfemos.data.outlierremover.
lipschitz
(data, itern=1, klim=7e-06, **kwargs)[source]¶ A function to remove outliers based on Lipschitz continuity. Calculates the change (slope, K) in y=f(x) function between two datapoints.
K = |f(x1) - f(x0)| / |x1 - x0|
Datapoints which cause a slope too steep, are removed.
-
polyfemos.data.outlierremover.
stalta
(data, nsta=3, nlta=10, threson=1.08, thresoff=1.05, offset=40, **kwargs)[source]¶ Utilises
classic_sta_lta()
to remove outliers- Parameters
data (
ndarray
) – x-y data in Nx2 array, shape (N, 2)nsta (int) – Length of short time average window in samples
nlta (int) – Length of long time average window in samples
threson (float) – Value above which trigger (of characteristic function) is activated (higher threshold)
thresoff (float) – Value below which trigger (of characteristic function) is deactivated (lower threshold)
offset (int) – in samples, how many additional samples are removed before on trigger and after off trigger
- Return type
- Returns
mask array containing bool values
Private Functions