Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection

被引:163
|
作者
Radovanovic, Milos [1 ]
Nanopoulos, Alexandros [2 ]
Ivanovic, Mirjana [3 ]
机构
[1] Univ Novi Sad, Fac Sci, Dept Math & Informat, Novi Sad 21000, Serbia
[2] Univ Eichstaett Ingolstadt, Ingolstadt Sch Management, Ingolstadt, Germany
[3] Univ Novi Sad, Fac Sci, Novi Sad 21000, Serbia
关键词
Outlier detection; reverse nearest neighbors; high-dimensional data; distance concentration; ALGORITHMS;
D O I
10.1109/TKDE.2014.2365790
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection in high-dimensional data presents various challenges resulting from the "curse of dimensionality." A prevailing view is that distance concentration, i.e., the tendency of distances in high-dimensional data to become indiscernible, hinders the detection of outliers by making distance-based methods label all points as almost equally good outliers. In this paper, we provide evidence supporting the opinion that such a view is too simple, by demonstrating that distance-based methods can produce more contrasting outlier scores in high-dimensional settings. Furthermore, we show that high dimensionality can have a different impact, by reexamining the notion of reverse nearest neighbors in the unsupervised outlier-detection context. Namely, it was recently observed that the distribution of points' reverse-neighbor counts becomes skewed in high dimensions, resulting in the phenomenon known as hubness. We provide insight into how some points (antihubs) appear very infrequently in k-NN lists of other points, and explain the connection between antihubs, outliers, and existing unsupervised outlier-detection methods. By evaluating the classic k-NN method, the angle-based technique designed for high-dimensional data, the density-based local outlier factor and influenced outlierness methods, and antihub-based methods on various synthetic and real-world data sets, we offer novel insight into the usefulness of reverse neighbor counts in unsupervised outlier detection.
引用
收藏
页码:1369 / 1382
页数:14
相关论文
共 50 条
  • [1] An Empirical Analysis of Hubness in Unsupervised Distance-Based Outlier Detection
    Flexer, Arthur
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 716 - 723
  • [2] Distance-Based k-Nearest Neighbors Outlier Detection Method in Large-Scale Traffic Data
    Dang, Taurus T.
    Ngan, Henry E. T.
    Liu, Wei
    2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2015, : 507 - 510
  • [3] A DATA STREAM OUTLIER DETECTION ALGORITHM BASED ON REVERSE K NEAREST NEIGHBORS
    Zhang, ZhongPing
    Liang, YongXin
    ADVANCED RESEARCH ON AUTOMATION, COMMUNICATION, ARCHITECTONICS AND MATERIALS, PTS 1 AND 2, 2011, 225-226 (1-2): : 1032 - 1035
  • [4] Distance-based Outlier Detection in Data Streams
    Tran, Luan
    Fan, Liyue
    Shahabi, Cyrus
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 1089 - 1100
  • [5] Distance-based outlier detection on uncertain data
    Yu, Hao
    Wang, Bin
    Xiao, Gang
    Yang, Xiaochun
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2010, 47 (03): : 474 - 484
  • [6] RSOD: Efficient Technique for Outlier Detection using Reverse Nearest Neighbors Statistics
    Uttarkabat, Satarupa
    Sunkara, Naga Dhanunjay
    Patra, Bidyut Kr
    2020 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NETWORKS (CINE 2020), 2020,
  • [7] GPU Strategies for Distance-Based Outlier Detection
    Angiulli, Fabrizio
    Basta, Stefano
    Lodi, Stefano
    Sartori, Claudio
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (11) : 3256 - 3268
  • [8] Shared nearest neighbors based outlier detection for biological sequences
    Zhang, Lisheng
    He, Zehua
    Lei, Dajiang
    International Journal of Digital Content Technology and its Applications, 2012, 6 (12) : 1 - 10
  • [9] An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering
    Chakraborty, Bodhan
    Chaterjee, Agneet
    Malakar, Samir
    Sarkar, Ram
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (04) : 3215 - 3230
  • [10] An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering
    Bodhan Chakraborty
    Agneet Chaterjee
    Samir Malakar
    Ram Sarkar
    Complex & Intelligent Systems, 2022, 8 : 3215 - 3230