Pre-Cutoff Value Calculation Method for Accelerating Metric Space Outlier Detection

被引:0
作者
Xu, Honglong [1 ]
Liang, Zhonghao [1 ]
Huang, Kaide [2 ]
Huang, Guoshun [3 ]
He, Yan [1 ]
机构
[1] Foshan Univ, Foshan, Peoples R China
[2] Foshan Univ, Comp Sci, Foshan, Peoples R China
[3] Foshan Univ, Artificial Intelligence, Foshan, Peoples R China
关键词
distance triangle inequality; index; metric space; outlier detection; pre-cutoff value; MINING OUTLIERS; ALGORITHMS;
D O I
10.4018/TJGHPC.334125
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Outlier detection is an important data mining technique. In this article, the triangle inequality of distances is leveraged to design a pre-cutoff value (PCV) algorithm that calculates the outlier degree pre-threshold without additional distance computations. This algorithm is suitable for accelerating various metric space outlier detection algorithms. Experimental results on multiple real datasets demonstrate that the PCV algorithm reduces the runtime and number of distance computations for the iORCA algorithm by 14.59% and 15.73%, respectively. Even compared to the new high -performance algorithm ADPOD, the PCV algorithm achieves 1.41% and 0.45% reductions. Notably, the non-outlier exclusion for the first data block in the dataset is significantly improved, with an exclusion rate of up to 36.5%, leading to a 23.54% reduction in detection time for that data block. While demonstrating excellent results, the PCV algorithm maintains the data type generality of metric space algorithms.
引用
收藏
页数:17
相关论文
共 26 条
[1]   Distance-based detection and prediction of outliers [J].
Angiulli, F ;
Basta, S ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) :145-160
[2]   Outlier mining in large high-dimensional data sets [J].
Angiulli, F ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) :203-215
[3]  
Angiulli F., 2007, CIKM 07, P791, DOI DOI 10.1145/1321440.1321550
[4]   Boundary-aware local Density-based outlier detection [J].
Aydin, Fatih .
INFORMATION SCIENCES, 2023, 647
[5]  
Bay S. D., 2003, P 9 ACM SIGKDD INT C, DOI [10.1145/956750.956758, DOI 10.1145/956750.956758]
[6]  
Bhaduri K., 2011, Proceedings of the 17th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, P859, DOI [DOI 10.1145/2020408.2020554, 10.1145/2020408.2020554]
[7]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[8]  
Cao L, 2014, PROC INT CONF DATA, P76, DOI 10.1109/ICDE.2014.6816641
[9]   CPS-GUARD: Intrusion detection for cyber-physical systems and IoT devices using outlier-aware deep autoencoders [J].
Catillo, Marta ;
Pecchia, Antonio ;
Villano, Umberto .
COMPUTERS & SECURITY, 2023, 129
[10]  
Chawla S., 2013, P 2013 SIAM INT C DA, P189, DOI 10.1137/1.9781611972832.21