PyOD is a comprehensive Python toolkit to identify outlying objects in multivariate data with both unsupervised and supervised approaches.
Linear Models for Outlier Detection:
weighted projected distances to the eigenvector hyperplane as the outlier outlier scores) [10] 2. MCD: Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores) [11, 12] 3. One-Class Support Vector Machines [3]
Proximity-Based Outlier Detection Models:
neighbor as the outlier score) 4. Average kNN Outlier Detection (use the average distance to k nearest neighbors as the outlier score) 5. Median kNN Outlier Detection (use the median distance to k nearest neighbors as the outlier score) 6. HBOS: Histogram-based Outlier Score [5]
Probabilistic Models for Outlier Detection:
Outlier Ensembles and Combination Frameworks
from __future__ import division
from __future__ import print_function
import os
import sys
from time import time
# temporary solution for relative imports in case pyod is not installed
# if pyod is installed, no need to use the following line
sys.path.append(
os.path.abspath(os.path.join(os.path.dirname("__file__"), '..')))
# supress warnings for clean output
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from scipy.io import loadmat
from pyod.models.abod import ABOD
from pyod.models.cblof import CBLOF
from pyod.models.feature_bagging import FeatureBagging
from pyod.models.hbos import HBOS
from pyod.models.iforest import IForest
from pyod.models.knn import KNN
from pyod.models.lof import LOF
from pyod.models.mcd import MCD
from pyod.models.ocsvm import OCSVM
from pyod.models.pca import PCA
from pyod.utils.utility import standardizer
from pyod.utils.utility import precision_n_scores
from sklearn.metrics import roc_auc_score
# Define data file and read X and y
mat_file_list = ['arrhythmia.mat',
'cardio.mat',
'glass.mat',
'ionosphere.mat',
'letter.mat',
'lympho.mat',
'mnist.mat',
'musk.mat',
'optdigits.mat',
'pendigits.mat',
'pima.mat',
'satellite.mat',
'satimage-2.mat',
'shuttle.mat',
'vertebral.mat',
'vowels.mat',
'wbc.mat']
# Define nine outlier detection tools to be compared
random_state = np.random.RandomState(42)
df_columns = ['Data', '#Samples', '# Dimensions', 'Outlier Perc',
'ABOD', 'CBLOF', 'FB', 'HBOS', 'IForest', 'KNN', 'LOF', 'MCD',
'OCSVM', 'PCA']
roc_df = pd.DataFrame(columns=df_columns)
prn_df = pd.DataFrame(columns=df_columns)
time_df = pd.DataFrame(columns=df_columns)
for mat_file in mat_file_list:
print("\n... Processing", mat_file, '...')
mat = loadmat(os.path.join('data', mat_file))
X = mat['X']
y = mat['y'].ravel()
outliers_fraction = np.count_nonzero(y) / len(y)
outliers_percentage = round(outliers_fraction * 100, ndigits=4)
# construct containers for saving results
roc_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]
prn_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]
time_list = [mat_file[:-4], X.shape[0], X.shape[1], outliers_percentage]
# 60% data for training and 40% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4,
random_state=random_state)
# standardizing data for processing
X_train_norm, X_test_norm = standardizer(X_train, X_test)
classifiers = {'Angle-based Outlier Detector (ABOD)': ABOD(
contamination=outliers_fraction),
'Cluster-based Local Outlier Factor': CBLOF(
contamination=outliers_fraction, check_estimator=False,
random_state=random_state),
'Feature Bagging': FeatureBagging(contamination=outliers_fraction,
check_estimator=False,
random_state=random_state),
'Histogram-base Outlier Detection (HBOS)': HBOS(
contamination=outliers_fraction),
'Isolation Forest': IForest(contamination=outliers_fraction,
random_state=random_state),
'K Nearest Neighbors (KNN)': KNN(contamination=outliers_fraction),
'Local Outlier Factor (LOF)': LOF(
contamination=outliers_fraction),
'Minimum Covariance Determinant (MCD)': MCD(
contamination=outliers_fraction, random_state=random_state),
'One-class SVM (OCSVM)': OCSVM(contamination=outliers_fraction,
random_state=random_state),
'Principal Component Analysis (PCA)': PCA(
contamination=outliers_fraction, random_state=random_state),
}
for clf_name, clf in classifiers.items():
t0 = time()
clf.fit(X_train_norm)
test_scores = clf.decision_function(X_test_norm)
t1 = time()
duration = round(t1 - t0, ndigits=4)
time_list.append(duration)
roc = round(roc_auc_score(y_test, test_scores), ndigits=4)
prn = round(precision_n_scores(y_test, test_scores), ndigits=4)
print('{clf_name} ROC:{roc}, precision @ rank n:{prn}, '
'execution time: {duration}s'.format(
clf_name=clf_name, roc=roc, prn=prn, duration=duration))
roc_list.append(roc)
prn_list.append(prn)
temp_df = pd.DataFrame(time_list).transpose()
temp_df.columns = df_columns
time_df = pd.concat([time_df, temp_df], axis=0)
temp_df = pd.DataFrame(roc_list).transpose()
temp_df.columns = df_columns
roc_df = pd.concat([roc_df, temp_df], axis=0)
temp_df = pd.DataFrame(prn_list).transpose()
temp_df.columns = df_columns
prn_df = pd.concat([prn_df, temp_df], axis=0)
... Processing arrhythmia.mat ... Angle-based Outlier Detector (ABOD) ROC:0.7687, precision @ rank n:0.3571, execution time: 0.171s Cluster-based Local Outlier Factor ROC:0.778, precision @ rank n:0.5, execution time: 0.0344s Feature Bagging ROC:0.7736, precision @ rank n:0.5, execution time: 0.5254s Histogram-base Outlier Detection (HBOS) ROC:0.8511, precision @ rank n:0.5714, execution time: 0.2122s Isolation Forest ROC:0.8217, precision @ rank n:0.5, execution time: 0.2036s K Nearest Neighbors (KNN) ROC:0.782, precision @ rank n:0.5, execution time: 0.0853s Local Outlier Factor (LOF) ROC:0.7787, precision @ rank n:0.4643, execution time: 0.0702s Minimum Covariance Determinant (MCD) ROC:0.8228, precision @ rank n:0.4286, execution time: 0.5374s One-class SVM (OCSVM) ROC:0.7986, precision @ rank n:0.5, execution time: 0.0481s Principal Component Analysis (PCA) ROC:0.7997, precision @ rank n:0.5, execution time: 0.0572s ... Processing cardio.mat ... Angle-based Outlier Detector (ABOD) ROC:0.5668, precision @ rank n:0.209, execution time: 0.5818s Cluster-based Local Outlier Factor ROC:0.8987, precision @ rank n:0.5075, execution time: 0.031s Feature Bagging ROC:0.5667, precision @ rank n:0.1194, execution time: 0.909s Histogram-base Outlier Detection (HBOS) ROC:0.8102, precision @ rank n:0.3731, execution time: 0.0469s Isolation Forest ROC:0.8726, precision @ rank n:0.3433, execution time: 0.2626s K Nearest Neighbors (KNN) ROC:0.7252, precision @ rank n:0.2388, execution time: 0.1524s Local Outlier Factor (LOF) ROC:0.5313, precision @ rank n:0.1493, execution time: 0.0898s Minimum Covariance Determinant (MCD) ROC:0.7966, precision @ rank n:0.3284, execution time: 0.4314s One-class SVM (OCSVM) ROC:0.9055, precision @ rank n:0.3731, execution time: 0.0922s Principal Component Analysis (PCA) ROC:0.9237, precision @ rank n:0.4925, execution time: 0.004s ... Processing glass.mat ... Angle-based Outlier Detector (ABOD) ROC:0.7605, precision @ rank n:0.0, execution time: 0.0681s Cluster-based Local Outlier Factor ROC:0.7457, precision @ rank n:0.0, execution time: 0.0192s Feature Bagging ROC:0.758, precision @ rank n:0.2, execution time: 0.0156s Histogram-base Outlier Detection (HBOS) ROC:0.6346, precision @ rank n:0.0, execution time: 0.0156s Isolation Forest ROC:0.5556, precision @ rank n:0.0, execution time: 0.1253s K Nearest Neighbors (KNN) ROC:0.8198, precision @ rank n:0.0, execution time: 0.0091s Local Outlier Factor (LOF) ROC:0.8395, precision @ rank n:0.2, execution time: 0.002s Minimum Covariance Determinant (MCD) ROC:0.7728, precision @ rank n:0.0, execution time: 0.0356s One-class SVM (OCSVM) ROC:0.5506, precision @ rank n:0.0, execution time: 0.0s Principal Component Analysis (PCA) ROC:0.5506, precision @ rank n:0.0, execution time: 0.0s ... Processing ionosphere.mat ... Angle-based Outlier Detector (ABOD) ROC:0.92, precision @ rank n:0.8333, execution time: 0.0937s Cluster-based Local Outlier Factor ROC:0.812, precision @ rank n:0.6111, execution time: 0.0156s Feature Bagging ROC:0.9004, precision @ rank n:0.7407, execution time: 0.0625s Histogram-base Outlier Detection (HBOS) ROC:0.6005, precision @ rank n:0.4259, execution time: 0.0313s Isolation Forest ROC:0.8587, precision @ rank n:0.6667, execution time: 0.16s K Nearest Neighbors (KNN) ROC:0.9378, precision @ rank n:0.8704, execution time: 0.0157s Local Outlier Factor (LOF) ROC:0.9063, precision @ rank n:0.7407, execution time: 0.0156s Minimum Covariance Determinant (MCD) ROC:0.9513, precision @ rank n:0.8704, execution time: 0.0468s One-class SVM (OCSVM) ROC:0.8497, precision @ rank n:0.7593, execution time: 0.0s Principal Component Analysis (PCA) ROC:0.8025, precision @ rank n:0.6481, execution time: 0.0s ... Processing letter.mat ... Angle-based Outlier Detector (ABOD) ROC:0.8992, precision @ rank n:0.3438, execution time: 0.4973s Cluster-based Local Outlier Factor ROC:0.5905, precision @ rank n:0.0625, execution time: 0.0156s Feature Bagging ROC:0.8938, precision @ rank n:0.4062, execution time: 0.7252s Histogram-base Outlier Detection (HBOS) ROC:0.6328, precision @ rank n:0.0312, execution time: 0.0625s Isolation Forest ROC:0.6445, precision @ rank n:0.0312, execution time: 0.2632s K Nearest Neighbors (KNN) ROC:0.8972, precision @ rank n:0.3438, execution time: 0.1263s Local Outlier Factor (LOF) ROC:0.8821, precision @ rank n:0.3125, execution time: 0.0943s Minimum Covariance Determinant (MCD) ROC:0.8766, precision @ rank n:0.125, execution time: 0.9454s One-class SVM (OCSVM) ROC:0.6071, precision @ rank n:0.0938, execution time: 0.0762s Principal Component Analysis (PCA) ROC:0.5265, precision @ rank n:0.0625, execution time: 0.004s ... Processing lympho.mat ... Angle-based Outlier Detector (ABOD) ROC:0.7155, precision @ rank n:0.0, execution time: 0.0401s Cluster-based Local Outlier Factor ROC:0.9914, precision @ rank n:0.5, execution time: 0.017s Feature Bagging ROC:0.9483, precision @ rank n:0.5, execution time: 0.023s Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.0071s Isolation Forest ROC:1.0, precision @ rank n:1.0, execution time: 0.1614s K Nearest Neighbors (KNN) ROC:0.9397, precision @ rank n:0.5, execution time: 0.005s Local Outlier Factor (LOF) ROC:0.9569, precision @ rank n:0.5, execution time: 0.002s Minimum Covariance Determinant (MCD) ROC:0.9483, precision @ rank n:0.5, execution time: 0.031s One-class SVM (OCSVM) ROC:0.9655, precision @ rank n:0.5, execution time: 0.002s Principal Component Analysis (PCA) ROC:0.9914, precision @ rank n:0.5, execution time: 0.002s ... Processing mnist.mat ... Angle-based Outlier Detector (ABOD) ROC:0.7747, precision @ rank n:0.384, execution time: 7.7986s Cluster-based Local Outlier Factor ROC:0.8431, precision @ rank n:0.365, execution time: 0.067s Feature Bagging ROC:0.7246, precision @ rank n:0.3422, execution time: 42.0965s Histogram-base Outlier Detection (HBOS) ROC:0.5769, precision @ rank n:0.1217, execution time: 0.9069s Isolation Forest ROC:0.8033, precision @ rank n:0.2966, execution time: 1.4098s K Nearest Neighbors (KNN) ROC:0.8431, precision @ rank n:0.4183, execution time: 5.7465s Local Outlier Factor (LOF) ROC:0.7101, precision @ rank n:0.3384, execution time: 5.7854s Minimum Covariance Determinant (MCD) ROC:0.9059, precision @ rank n:0.5133, execution time: 2.0246s One-class SVM (OCSVM) ROC:0.851, precision @ rank n:0.3802, execution time: 4.2719s Principal Component Analysis (PCA) ROC:0.8497, precision @ rank n:0.3688, execution time: 0.125s ... Processing musk.mat ... Angle-based Outlier Detector (ABOD) ROC:0.2716, precision @ rank n:0.0714, execution time: 2.2028s Cluster-based Local Outlier Factor ROC:1.0, precision @ rank n:1.0, execution time: 0.028s Feature Bagging ROC:0.6591, precision @ rank n:0.2143, execution time: 10.1455s Histogram-base Outlier Detection (HBOS) ROC:1.0, precision @ rank n:1.0, execution time: 0.5653s Isolation Forest ROC:1.0, precision @ rank n:1.0, execution time: 0.8163s K Nearest Neighbors (KNN) ROC:0.8247, precision @ rank n:0.2857, execution time: 1.5487s Local Outlier Factor (LOF) ROC:0.6128, precision @ rank n:0.2143, execution time: 1.466s Minimum Covariance Determinant (MCD) ROC:1.0, precision @ rank n:0.9762, execution time: 8.3801s One-class SVM (OCSVM) ROC:1.0, precision @ rank n:1.0, execution time: 1.3012s Principal Component Analysis (PCA) ROC:1.0, precision @ rank n:1.0, execution time: 0.1484s ... Processing optdigits.mat ... Angle-based Outlier Detector (ABOD) ROC:0.4971, precision @ rank n:0.0299, execution time: 2.8534s Cluster-based Local Outlier Factor ROC:0.5922, precision @ rank n:0.0, execution time: 0.025s Feature Bagging ROC:0.4715, precision @ rank n:0.0448, execution time: 10.828s Histogram-base Outlier Detection (HBOS) ROC:0.8553, precision @ rank n:0.209, execution time: 0.4581s Isolation Forest ROC:0.7033, precision @ rank n:0.0299, execution time: 0.658s K Nearest Neighbors (KNN) ROC:0.4029, precision @ rank n:0.0, execution time: 1.7452s Local Outlier Factor (LOF) ROC:0.4934, precision @ rank n:0.0448, execution time: 1.9845s Minimum Covariance Determinant (MCD) ROC:0.4041, precision @ rank n:0.0, execution time: 1.2785s One-class SVM (OCSVM) ROC:0.4808, precision @ rank n:0.0, execution time: 1.5436s Principal Component Analysis (PCA) ROC:0.5016, precision @ rank n:0.0, execution time: 0.0537s ... Processing pendigits.mat ... Angle-based Outlier Detector (ABOD) ROC:0.6957, precision @ rank n:0.1127, execution time: 1.8649s Cluster-based Local Outlier Factor ROC:0.9329, precision @ rank n:0.2394, execution time: 0.0669s Feature Bagging ROC:0.4277, precision @ rank n:0.0986, execution time: 4.0121s Histogram-base Outlier Detection (HBOS) ROC:0.9322, precision @ rank n:0.3099, execution time: 0.1376s Isolation Forest ROC:0.9756, precision @ rank n:0.3944, execution time: 0.5828s K Nearest Neighbors (KNN) ROC:0.7694, precision @ rank n:0.1549, execution time: 0.5576s Local Outlier Factor (LOF) ROC:0.4056, precision @ rank n:0.0986, execution time: 0.5517s Minimum Covariance Determinant (MCD) ROC:0.8413, precision @ rank n:0.0986, execution time: 1.609s One-class SVM (OCSVM) ROC:0.9376, precision @ rank n:0.3662, execution time: 0.8713s Principal Component Analysis (PCA) ROC:0.9384, precision @ rank n:0.3803, execution time: 0.005s ... Processing pima.mat ... Angle-based Outlier Detector (ABOD) ROC:0.6623, precision @ rank n:0.4906, execution time: 0.1805s Cluster-based Local Outlier Factor ROC:0.7654, precision @ rank n:0.5755, execution time: 0.022s Feature Bagging ROC:0.6523, precision @ rank n:0.4811, execution time: 0.0851s Histogram-base Outlier Detection (HBOS) ROC:0.7016, precision @ rank n:0.5283, execution time: 0.0168s Isolation Forest ROC:0.696, precision @ rank n:0.5283, execution time: 0.1829s K Nearest Neighbors (KNN) ROC:0.71, precision @ rank n:0.5094, execution time: 0.0156s Local Outlier Factor (LOF) ROC:0.6455, precision @ rank n:0.4717, execution time: 0.0156s Minimum Covariance Determinant (MCD) ROC:0.6755, precision @ rank n:0.5094, execution time: 0.0468s One-class SVM (OCSVM) ROC:0.6024, precision @ rank n:0.4528, execution time: 0.0s Principal Component Analysis (PCA) ROC:0.6508, precision @ rank n:0.5, execution time: 0.0s ... Processing satellite.mat ... Angle-based Outlier Detector (ABOD) ROC:0.5821, precision @ rank n:0.4077, execution time: 2.2945s Cluster-based Local Outlier Factor ROC:0.5428, precision @ rank n:0.3153, execution time: 0.0313s Feature Bagging ROC:0.5469, precision @ rank n:0.3957, execution time: 6.9375s Histogram-base Outlier Detection (HBOS) ROC:0.7471, precision @ rank n:0.5612, execution time: 0.2903s Isolation Forest ROC:0.7153, precision @ rank n:0.5743, execution time: 0.5562s K Nearest Neighbors (KNN) ROC:0.6868, precision @ rank n:0.5072, execution time: 0.8947s Local Outlier Factor (LOF) ROC:0.5509, precision @ rank n:0.3993, execution time: 0.8896s Minimum Covariance Determinant (MCD) ROC:0.8059, precision @ rank n:0.6906, execution time: 1.679s One-class SVM (OCSVM) ROC:0.6564, precision @ rank n:0.5372, execution time: 1.225s Principal Component Analysis (PCA) ROC:0.5902, precision @ rank n:0.4712, execution time: 0.0313s ... Processing satimage-2.mat ... Angle-based Outlier Detector (ABOD) ROC:0.8216, precision @ rank n:0.2, execution time: 1.8979s Cluster-based Local Outlier Factor ROC:0.9372, precision @ rank n:0.6, execution time: 0.0468s Feature Bagging ROC:0.3658, precision @ rank n:0.04, execution time: 5.081s Histogram-base Outlier Detection (HBOS) ROC:0.9862, precision @ rank n:0.68, execution time: 0.242s Isolation Forest ROC:0.9959, precision @ rank n:0.88, execution time: 0.4869s K Nearest Neighbors (KNN) ROC:0.9528, precision @ rank n:0.28, execution time: 0.7402s Local Outlier Factor (LOF) ROC:0.3717, precision @ rank n:0.0417, execution time: 0.7539s Minimum Covariance Determinant (MCD) ROC:0.9949, precision @ rank n:0.52, execution time: 1.4754s One-class SVM (OCSVM) ROC:0.9975, precision @ rank n:0.96, execution time: 1.1365s Principal Component Analysis (PCA) ROC:0.9901, precision @ rank n:0.8, execution time: 0.0022s ... Processing shuttle.mat ... Angle-based Outlier Detector (ABOD) ROC:0.6164, precision @ rank n:0.1785, execution time: 19.6385s Cluster-based Local Outlier Factor ROC:0.9899, precision @ rank n:0.95, execution time: 0.0781s Feature Bagging ROC:0.5342, precision @ rank n:0.0836, execution time: 57.0473s Histogram-base Outlier Detection (HBOS) ROC:0.9847, precision @ rank n:0.9657, execution time: 0.4604s Isolation Forest ROC:0.9963, precision @ rank n:0.9022, execution time: 2.8161s K Nearest Neighbors (KNN) ROC:0.6409, precision @ rank n:0.2028, execution time: 7.8509s Local Outlier Factor (LOF) ROC:0.5373, precision @ rank n:0.1329, execution time: 10.3425s Minimum Covariance Determinant (MCD) ROC:0.9898, precision @ rank n:0.7192, execution time: 7.8624s One-class SVM (OCSVM) ROC:0.9919, precision @ rank n:0.9559, execution time: 38.0809s Principal Component Analysis (PCA) ROC:0.9901, precision @ rank n:0.9507, execution time: 0.0373s ... Processing vertebral.mat ... Angle-based Outlier Detector (ABOD) ROC:0.3435, precision @ rank n:0.0, execution time: 0.0671s Cluster-based Local Outlier Factor ROC:0.504, precision @ rank n:0.0, execution time: 0.0156s Feature Bagging ROC:0.4013, precision @ rank n:0.0, execution time: 0.0346s Histogram-base Outlier Detection (HBOS) ROC:0.4278, precision @ rank n:0.0, execution time: 0.0s Isolation Forest ROC:0.3628, precision @ rank n:0.0, execution time: 0.1377s K Nearest Neighbors (KNN) ROC:0.3226, precision @ rank n:0.0, execution time: 0.0156s Local Outlier Factor (LOF) ROC:0.4382, precision @ rank n:0.0, execution time: 0.0s Minimum Covariance Determinant (MCD) ROC:0.3772, precision @ rank n:0.0, execution time: 0.0534s One-class SVM (OCSVM) ROC:0.3868, precision @ rank n:0.0, execution time: 0.001s Principal Component Analysis (PCA) ROC:0.427, precision @ rank n:0.0, execution time: 0.001s ... Processing vowels.mat ... Angle-based Outlier Detector (ABOD) ROC:0.9718, precision @ rank n:0.5, execution time: 0.395s Cluster-based Local Outlier Factor ROC:0.5681, precision @ rank n:0.0, execution time: 0.0156s Feature Bagging ROC:0.9304, precision @ rank n:0.3636, execution time: 0.2297s Histogram-base Outlier Detection (HBOS) ROC:0.7306, precision @ rank n:0.1818, execution time: 0.0072s Isolation Forest ROC:0.7791, precision @ rank n:0.2727, execution time: 0.1966s K Nearest Neighbors (KNN) ROC:0.9661, precision @ rank n:0.4545, execution time: 0.0731s Local Outlier Factor (LOF) ROC:0.9277, precision @ rank n:0.3182, execution time: 0.0313s Minimum Covariance Determinant (MCD) ROC:0.6799, precision @ rank n:0.0455, execution time: 0.5917s One-class SVM (OCSVM) ROC:0.7739, precision @ rank n:0.3182, execution time: 0.0313s Principal Component Analysis (PCA) ROC:0.6362, precision @ rank n:0.2727, execution time: 0.0156s ... Processing wbc.mat ... Angle-based Outlier Detector (ABOD) ROC:0.9723, precision @ rank n:0.7273, execution time: 0.1057s Cluster-based Local Outlier Factor ROC:0.9658, precision @ rank n:0.6364, execution time: 0.0156s Feature Bagging ROC:0.9562, precision @ rank n:0.7273, execution time: 0.0703s Histogram-base Outlier Detection (HBOS) ROC:0.9678, precision @ rank n:0.7273, execution time: 0.0171s Isolation Forest ROC:0.9516, precision @ rank n:0.6364, execution time: 0.1392s K Nearest Neighbors (KNN) ROC:0.962, precision @ rank n:0.6364, execution time: 0.0156s Local Outlier Factor (LOF) ROC:0.9542, precision @ rank n:0.7273, execution time: 0.0179s Minimum Covariance Determinant (MCD) ROC:0.9523, precision @ rank n:0.5455, execution time: 0.0541s One-class SVM (OCSVM) ROC:0.9555, precision @ rank n:0.6364, execution time: 0.005s Principal Component Analysis (PCA) ROC:0.951, precision @ rank n:0.6364, execution time: 0.002s
print('Time complexity')
time_df
Time complexity
Data | #Samples | # Dimensions | Outlier Perc | ABOD | CBLOF | FB | HBOS | IForest | KNN | LOF | MCD | OCSVM | PCA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | arrhythmia | 452 | 274 | 14.6018 | 0.171 | 0.0344 | 0.5254 | 0.2122 | 0.2036 | 0.0853 | 0.0702 | 0.5374 | 0.0481 | 0.0572 |
0 | cardio | 1831 | 21 | 9.6122 | 0.5818 | 0.031 | 0.909 | 0.0469 | 0.2626 | 0.1524 | 0.0898 | 0.4314 | 0.0922 | 0.004 |
0 | glass | 214 | 9 | 4.2056 | 0.0681 | 0.0192 | 0.0156 | 0.0156 | 0.1253 | 0.0091 | 0.002 | 0.0356 | 0 | 0 |
0 | ionosphere | 351 | 33 | 35.8974 | 0.0937 | 0.0156 | 0.0625 | 0.0313 | 0.16 | 0.0157 | 0.0156 | 0.0468 | 0 | 0 |
0 | letter | 1600 | 32 | 6.25 | 0.4973 | 0.0156 | 0.7252 | 0.0625 | 0.2632 | 0.1263 | 0.0943 | 0.9454 | 0.0762 | 0.004 |
0 | lympho | 148 | 18 | 4.0541 | 0.0401 | 0.017 | 0.023 | 0.0071 | 0.1614 | 0.005 | 0.002 | 0.031 | 0.002 | 0.002 |
0 | mnist | 7603 | 100 | 9.2069 | 7.7986 | 0.067 | 42.0965 | 0.9069 | 1.4098 | 5.7465 | 5.7854 | 2.0246 | 4.2719 | 0.125 |
0 | musk | 3062 | 166 | 3.1679 | 2.2028 | 0.028 | 10.1455 | 0.5653 | 0.8163 | 1.5487 | 1.466 | 8.3801 | 1.3012 | 0.1484 |
0 | optdigits | 5216 | 64 | 2.8758 | 2.8534 | 0.025 | 10.828 | 0.4581 | 0.658 | 1.7452 | 1.9845 | 1.2785 | 1.5436 | 0.0537 |
0 | pendigits | 6870 | 16 | 2.2707 | 1.8649 | 0.0669 | 4.0121 | 0.1376 | 0.5828 | 0.5576 | 0.5517 | 1.609 | 0.8713 | 0.005 |
0 | pima | 768 | 8 | 34.8958 | 0.1805 | 0.022 | 0.0851 | 0.0168 | 0.1829 | 0.0156 | 0.0156 | 0.0468 | 0 | 0 |
0 | satellite | 6435 | 36 | 31.6395 | 2.2945 | 0.0313 | 6.9375 | 0.2903 | 0.5562 | 0.8947 | 0.8896 | 1.679 | 1.225 | 0.0313 |
0 | satimage-2 | 5803 | 36 | 1.2235 | 1.8979 | 0.0468 | 5.081 | 0.242 | 0.4869 | 0.7402 | 0.7539 | 1.4754 | 1.1365 | 0.0022 |
0 | shuttle | 49097 | 9 | 7.1511 | 19.6385 | 0.0781 | 57.0473 | 0.4604 | 2.8161 | 7.8509 | 10.3425 | 7.8624 | 38.0809 | 0.0373 |
0 | vertebral | 240 | 6 | 12.5 | 0.0671 | 0.0156 | 0.0346 | 0 | 0.1377 | 0.0156 | 0 | 0.0534 | 0.001 | 0.001 |
0 | vowels | 1456 | 12 | 3.4341 | 0.395 | 0.0156 | 0.2297 | 0.0072 | 0.1966 | 0.0731 | 0.0313 | 0.5917 | 0.0313 | 0.0156 |
0 | wbc | 378 | 30 | 5.5556 | 0.1057 | 0.0156 | 0.0703 | 0.0171 | 0.1392 | 0.0156 | 0.0179 | 0.0541 | 0.005 | 0.002 |
Analyze the performance of ROC and Precision @ n
print('ROC Performance')
roc_df
ROC Performance
Data | #Samples | # Dimensions | Outlier Perc | ABOD | CBLOF | FB | HBOS | IForest | KNN | LOF | MCD | OCSVM | PCA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | arrhythmia | 452 | 274 | 14.6018 | 0.7687 | 0.778 | 0.7736 | 0.8511 | 0.8217 | 0.782 | 0.7787 | 0.8228 | 0.7986 | 0.7997 |
0 | cardio | 1831 | 21 | 9.6122 | 0.5668 | 0.8987 | 0.5667 | 0.8102 | 0.8726 | 0.7252 | 0.5313 | 0.7966 | 0.9055 | 0.9237 |
0 | glass | 214 | 9 | 4.2056 | 0.7605 | 0.7457 | 0.758 | 0.6346 | 0.5556 | 0.8198 | 0.8395 | 0.7728 | 0.5506 | 0.5506 |
0 | ionosphere | 351 | 33 | 35.8974 | 0.92 | 0.812 | 0.9004 | 0.6005 | 0.8587 | 0.9378 | 0.9063 | 0.9513 | 0.8497 | 0.8025 |
0 | letter | 1600 | 32 | 6.25 | 0.8992 | 0.5905 | 0.8938 | 0.6328 | 0.6445 | 0.8972 | 0.8821 | 0.8766 | 0.6071 | 0.5265 |
0 | lympho | 148 | 18 | 4.0541 | 0.7155 | 0.9914 | 0.9483 | 1 | 1 | 0.9397 | 0.9569 | 0.9483 | 0.9655 | 0.9914 |
0 | mnist | 7603 | 100 | 9.2069 | 0.7747 | 0.8431 | 0.7246 | 0.5769 | 0.8033 | 0.8431 | 0.7101 | 0.9059 | 0.851 | 0.8497 |
0 | musk | 3062 | 166 | 3.1679 | 0.2716 | 1 | 0.6591 | 1 | 1 | 0.8247 | 0.6128 | 1 | 1 | 1 |
0 | optdigits | 5216 | 64 | 2.8758 | 0.4971 | 0.5922 | 0.4715 | 0.8553 | 0.7033 | 0.4029 | 0.4934 | 0.4041 | 0.4808 | 0.5016 |
0 | pendigits | 6870 | 16 | 2.2707 | 0.6957 | 0.9329 | 0.4277 | 0.9322 | 0.9756 | 0.7694 | 0.4056 | 0.8413 | 0.9376 | 0.9384 |
0 | pima | 768 | 8 | 34.8958 | 0.6623 | 0.7654 | 0.6523 | 0.7016 | 0.696 | 0.71 | 0.6455 | 0.6755 | 0.6024 | 0.6508 |
0 | satellite | 6435 | 36 | 31.6395 | 0.5821 | 0.5428 | 0.5469 | 0.7471 | 0.7153 | 0.6868 | 0.5509 | 0.8059 | 0.6564 | 0.5902 |
0 | satimage-2 | 5803 | 36 | 1.2235 | 0.8216 | 0.9372 | 0.3658 | 0.9862 | 0.9959 | 0.9528 | 0.3717 | 0.9949 | 0.9975 | 0.9901 |
0 | shuttle | 49097 | 9 | 7.1511 | 0.6164 | 0.9899 | 0.5342 | 0.9847 | 0.9963 | 0.6409 | 0.5373 | 0.9898 | 0.9919 | 0.9901 |
0 | vertebral | 240 | 6 | 12.5 | 0.3435 | 0.504 | 0.4013 | 0.4278 | 0.3628 | 0.3226 | 0.4382 | 0.3772 | 0.3868 | 0.427 |
0 | vowels | 1456 | 12 | 3.4341 | 0.9718 | 0.5681 | 0.9304 | 0.7306 | 0.7791 | 0.9661 | 0.9277 | 0.6799 | 0.7739 | 0.6362 |
0 | wbc | 378 | 30 | 5.5556 | 0.9723 | 0.9658 | 0.9562 | 0.9678 | 0.9516 | 0.962 | 0.9542 | 0.9523 | 0.9555 | 0.951 |
print('Precision @ n Performance')
prn_df
Precision @ n Performance
Data | #Samples | # Dimensions | Outlier Perc | ABOD | CBLOF | FB | HBOS | IForest | KNN | LOF | MCD | OCSVM | PCA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | arrhythmia | 452 | 274 | 14.6018 | 0.3571 | 0.5 | 0.5 | 0.5714 | 0.5 | 0.5 | 0.4643 | 0.4286 | 0.5 | 0.5 |
0 | cardio | 1831 | 21 | 9.6122 | 0.209 | 0.5075 | 0.1194 | 0.3731 | 0.3433 | 0.2388 | 0.1493 | 0.3284 | 0.3731 | 0.4925 |
0 | glass | 214 | 9 | 4.2056 | 0 | 0 | 0.2 | 0 | 0 | 0 | 0.2 | 0 | 0 | 0 |
0 | ionosphere | 351 | 33 | 35.8974 | 0.8333 | 0.6111 | 0.7407 | 0.4259 | 0.6667 | 0.8704 | 0.7407 | 0.8704 | 0.7593 | 0.6481 |
0 | letter | 1600 | 32 | 6.25 | 0.3438 | 0.0625 | 0.4062 | 0.0312 | 0.0312 | 0.3438 | 0.3125 | 0.125 | 0.0938 | 0.0625 |
0 | lympho | 148 | 18 | 4.0541 | 0 | 0.5 | 0.5 | 1 | 1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
0 | mnist | 7603 | 100 | 9.2069 | 0.384 | 0.365 | 0.3422 | 0.1217 | 0.2966 | 0.4183 | 0.3384 | 0.5133 | 0.3802 | 0.3688 |
0 | musk | 3062 | 166 | 3.1679 | 0.0714 | 1 | 0.2143 | 1 | 1 | 0.2857 | 0.2143 | 0.9762 | 1 | 1 |
0 | optdigits | 5216 | 64 | 2.8758 | 0.0299 | 0 | 0.0448 | 0.209 | 0.0299 | 0 | 0.0448 | 0 | 0 | 0 |
0 | pendigits | 6870 | 16 | 2.2707 | 0.1127 | 0.2394 | 0.0986 | 0.3099 | 0.3944 | 0.1549 | 0.0986 | 0.0986 | 0.3662 | 0.3803 |
0 | pima | 768 | 8 | 34.8958 | 0.4906 | 0.5755 | 0.4811 | 0.5283 | 0.5283 | 0.5094 | 0.4717 | 0.5094 | 0.4528 | 0.5 |
0 | satellite | 6435 | 36 | 31.6395 | 0.4077 | 0.3153 | 0.3957 | 0.5612 | 0.5743 | 0.5072 | 0.3993 | 0.6906 | 0.5372 | 0.4712 |
0 | satimage-2 | 5803 | 36 | 1.2235 | 0.2 | 0.6 | 0.04 | 0.68 | 0.88 | 0.28 | 0.0417 | 0.52 | 0.96 | 0.8 |
0 | shuttle | 49097 | 9 | 7.1511 | 0.1785 | 0.95 | 0.0836 | 0.9657 | 0.9022 | 0.2028 | 0.1329 | 0.7192 | 0.9559 | 0.9507 |
0 | vertebral | 240 | 6 | 12.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | vowels | 1456 | 12 | 3.4341 | 0.5 | 0 | 0.3636 | 0.1818 | 0.2727 | 0.4545 | 0.3182 | 0.0455 | 0.3182 | 0.2727 |
0 | wbc | 378 | 30 | 5.5556 | 0.7273 | 0.6364 | 0.7273 | 0.7273 | 0.6364 | 0.6364 | 0.7273 | 0.5455 | 0.6364 | 0.6364 |