Internal Evaluation of Unsupervised Outlier Detection

被引:21
|
作者
Marques, Henrique O. [1 ]
Campello, Ricardo J. G. B. [2 ]
Sander, Jorg [3 ]
Zimek, Arthur [4 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci ICMC, BR-13566590 Sao Carlos, SP, Brazil
[2] Univ Newcastle, Sch Math & Phys Sci MAPS, Univ Dr, Callaghan, NSW 2308, Australia
[3] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2E8, Canada
[4] Univ Southern Denmark, Dept Math & Comp Sci IMADA, Campusvej 55, DK-5230 Odense, Denmark
基金
瑞典研究理事会; 加拿大自然科学与工程研究理事会; 巴西圣保罗研究基金会;
关键词
Outlier detection; unsupervised evaluation; validation; DISTANCE-BASED OUTLIERS;
D O I
10.1145/3394053
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to unsupervised cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain, this problem has been notably overlooked. Here we discuss this problem and provide a solution for the internal evaluation of outlier detection results. Specifically, we describe an index called Internal, Relative Evaluation of Outlier Solutions (IREOS) that can evaluate and compare different candidate outlier detection solutions. Initially, the index is designed to evaluate binary solutions only, referred to as top-n outlier detection results. We then extend IREOS to the general case of non-binary solutions, consisting of outlier detection scorings. We also statistically adjust IREOS for chance and extensively evaluate it in several experiments involving different collections of synthetic and real datasets.
引用
收藏
页数:42
相关论文
共 50 条
  • [1] On the Internal Evaluation of Unsupervised Outlier Detection
    Marques, Henrique O.
    Campello, Ricardo J. G. B.
    Zimek, Arthur
    Sander, Jorg
    PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2015,
  • [2] Similarity-Based Unsupervised Evaluation of Outlier Detection
    Marques, Henrique O.
    Zimek, Arthur
    Campello, Ricardo J. G. B.
    Sander, Jorg
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2022), 2022, 13590 : 234 - 248
  • [3] On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
    Guilherme O. Campos
    Arthur Zimek
    Jörg Sander
    Ricardo J. G. B. Campello
    Barbora Micenková
    Erich Schubert
    Ira Assent
    Michael E. Houle
    Data Mining and Knowledge Discovery, 2016, 30 : 891 - 927
  • [4] On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
    Campos, Guilherme O.
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    Micenkova, Barbora
    Schubert, Erich
    Assent, Ira
    Houle, Michael E.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) : 891 - 927
  • [5] RDPOD: an unsupervised approach for outlier detection
    Abhaya Abhaya
    Bidyut Kr. Patra
    Neural Computing and Applications, 2022, 34 : 1065 - 1077
  • [6] Unsupervised outlier detection in multidimensional data
    Atiq ur Rehman
    Samir Brahim Belhaouari
    Journal of Big Data, 8
  • [7] A new unsupervised outlier detection method
    Zheng, Lina
    Chen, Lijun
    Wang, Yini
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (01) : 1713 - 1734
  • [8] RDPOD: an unsupervised approach for outlier detection
    Abhaya, Abhaya
    Patra, Bidyut Kr
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (02): : 1065 - 1077
  • [9] Unsupervised outlier detection in multidimensional data
    Ur Rehman, Atiq
    Belhaouari, Samir Brahim
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [10] Bagged Subspaces for Unsupervised Outlier Detection
    Pasillas-Diaz, Jose Ramon
    Ratte, Sylvie
    COMPUTATIONAL INTELLIGENCE, 2017, 33 (03) : 507 - 523