Evaluating Detectors

In scikit-clean, A Detector only identifies/detects the mislabelled samples. It’s not a complete classifier (rather a part of one). So procedure for their evaluation is different.

We can view a noise detector as a binary classifier: it’s job is to provide a probability denoting if a sample is “mislabelled” or “clean”. We can therefore use binary classification metrics that work on continuous output: brier score, log loss, area under ROC curve etc.

[1]:
# Suppress warnings, you should remove this before modifying this notebook
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.metrics import brier_score_loss, log_loss, roc_auc_score

from skclean.tests.common_stuff import NOISE_DETECTORS  # All noise detectors in skclean
from skclean.utils import load_data
from skclean.detectors.base import BaseDetector
from skclean.simulate_noise import flip_labels_uniform
[2]:
class DummyDetector(BaseDetector):
    def detect(self, X, y):
        return np.random.uniform(size=y.shape)

from skclean.detectors import KDN, RkDN
class WkDN:
    def detect(self,X,y):
        return .5 * KDN().detect(X,y) + .5 * RkDN().detect(X,y)

ALL_DETECTOTS = [DummyDetector(), WkDN()] + NOISE_DETECTORS
[3]:
X, y = make_classification(1800, 10)
#X, y = load_data('breast_cancer')

yn = flip_labels_uniform(y, .3)  # 30% label noise
clean_idx = (y==yn)              # Indices of correctly labelled samples
[4]:
df = pd.DataFrame()
for d in ALL_DETECTOTS:
    conf_score = d.detect(X, yn)
    for name,loss_func in zip(['log','brier','roc'],
                         [log_loss, brier_score_loss, roc_auc_score]):
        loss = loss_func(clean_idx, conf_score)
        df.at[d.__class__.__name__,name] = np.round(loss,3)
df
[4]:
log brier roc
DummyDetector 0.999 0.333 0.501
WkDN 0.664 0.183 0.811
ForestKDN 1.099 0.131 0.858
InstanceHardness 0.448 0.141 0.902
KDN 0.830 0.173 0.818
RkDN 3.371 0.227 0.749
MCS 0.294 0.071 0.955
PartitioningDetector 0.942 0.072 0.950
RandomForestDetector 0.464 0.145 0.908

Note that in case of roc_auc_score, higher is better.

[ ]: