Detection of local and clustered outliers based on the density-distance decision graph

被引:20
作者
Li, Kangsheng [1 ]
Gao, Xin [1 ]
Jia, Xin [1 ]
Xue, Bing [1 ]
Fu, Shiyuan [1 ]
Liu, Zhiyu [1 ]
Huang, Xu [1 ]
Huang, Zijian [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Outlier detection; Anomaly detection; Local reachable density; Kernel density estimation; Density lifting distance; Density-distance decision graph;
D O I
10.1016/j.engappai.2022.104719
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection tasks refer to identifying the objects that have different characteristics from the normal observations. Most existing approaches detect outliers from the global perspective, which can effectively detect global outliers and most clustered outliers but cannot detect local outliers when the normal samples form clusters with different densities. The methods based on local outlier factors can effectively detect local outliers, but when the number of outliers increases, the more occurrences of clustered outliers will lead to the degeneration of the detection performance. We proposed an outlier detection method based on density-distance decision graph to detect local, global and clustered outliers simultaneously. Firstly, kernel density estimation and local reachable distance are combined to calculate the local density. The density ratio of the neighbors of an instance to itself is calculated as the degree of local outliers. Then, we propose a metric named density lifting distance as the degree of global outliers, which is calculated by the distance between k nearest neighbors with higher density of the instance and itself. The density ratio and density lift distance are combined to draw the density-distance decision graph, and the product of two metrics is calculated as the final outlier score. Comprehensive experiments were conducted on 8 synthetic datasets and 16 real-world datasets compared with 12 state-of-the-art methods. The results show that the proposed method works well when the samples form clusters with different densities as well as the percentage of outliers varies, and outperforms the state-of-the-art methods tested in terms of AUC.
引用
收藏
页数:15
相关论文
共 23 条
[1]  
Aggarwal Charu, 2017, Outlier Analysis, V2nd, DOI DOI 10.1007/978-3-319-47578-3
[2]   IDENTIFICATION OF OUTLIERS - HAWKINS,DM [J].
ATKINSON, AC .
BIOMETRICS, 1981, 37 (04) :860-861
[3]   Outlier Detection in Indoor Localization and Internet of Things (IoT) using Machine Learning [J].
Bhatti, Mansoor Ahmed ;
Riaz, Rabia ;
Rizvi, Sanam Shahla ;
Shokat, Sana ;
Riaz, Farina ;
Kwon, Se Jin .
JOURNAL OF COMMUNICATIONS AND NETWORKS, 2020, 22 (03) :236-243
[4]  
Boukerche A, 2020, ACM COMPUT SURV, V53, DOI [10.1145/3381028, 10.1145/3421763]
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]   On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study [J].
Campos, Guilherme O. ;
Zimek, Arthur ;
Sander, Jorg ;
Campello, Ricardo J. G. B. ;
Micenkova, Barbora ;
Schubert, Erich ;
Assent, Ira ;
Houle, Michael E. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) :891-927
[7]  
Davis J., 2006, P 23 INT C MACH LEAR, V148, P233, DOI 10.1145/1143844.1143874
[8]   A comparative evaluation of outlier detection algorithms: Experiments and analyses [J].
Domingues, Remi ;
Filippone, Maurizio ;
Michiardi, Pietro ;
Zouaoui, Jihane .
PATTERN RECOGNITION, 2018, 74 :406-421
[9]   K-means properties on six clustering benchmark datasets [J].
Franti, Pasi ;
Sieranoja, Sami .
APPLIED INTELLIGENCE, 2018, 48 (12) :4743-4759
[10]  
Goldstein M., 2012, KI-2012: Poster and Demo Track, P59