A High-Dimensional Outlier Detection Approach Based on Local Coulomb Force

被引:10
作者
Zhu, Pengyun [1 ]
Zhang, Chaowei [2 ]
Li, Xiaofeng [1 ]
Zhang, Jifu [1 ]
Qin, Xiao [3 ]
机构
[1] Taiyuan Univ Sci & Technol TYUST, Sch Comp Sci & Technol, Taiyuan 030024, Shanxi, Peoples R China
[2] Yangzhou Univ, Dept Comp Sci, Yangzhou 225127, Jiangsu, Peoples R China
[3] Auburn Univ, Samuel Ginn Coll Engn, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Force; Anomaly detection; Task analysis; Interference; Force measurement; Indexes; Euclidean distance; High-dimensional outlier detection; similarity metric; outlier coulomb resultant force; local outlier coulomb force; neighborhood outlier factor; ALGORITHM;
D O I
10.1109/TKDE.2022.3172167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional outlier detections are inadequate for high-dimensional data analysis due to the interference of distance tending to be concentrated ("curse of dimensionality"). Inspired by the Coulomb's law, we propose a new high-dimensional data similarity measure vector, which consists of outlier Coulomb force and outlier Coulomb resultant force. Outlier Coulomb force not only effectively gauges similarity measures among data objects, but also fully reflects differences among dimensions of data objects by vector projection in each dimension. More importantly, Coulomb resultant force can effectively measure deviations of data objects from a data center, making detection results interpretable. We introduce a new neighborhood outlier factor, which drives the development of a high-dimensional outlier detection algorithm. In our approach, attribute values with a high deviation degree is treated as interpretable information of outlier data. Finally, we implement and evaluate our algorithm using the UCI and synthetic datasets. Our experimental results show that the algorithm effectively alleviates the interference of "Curse of Dimensionality". The findings confirm that high-dimensional outlier data originated by the algorithm are interpretable.
引用
收藏
页码:5506 / 5520
页数:15
相关论文
共 39 条
  • [1] An effective and efficient algorithm for high-dimensional outlier detection
    Aggarwal, CC
    Yu, PS
    [J]. VLDB JOURNAL, 2005, 14 (02) : 211 - 221
  • [2] Sparse representation and overcomplete dictionary learning for anomaly detection in electrocardiograms
    Andrysiak, Tomasz
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (05) : 1269 - 1285
  • [3] Distance-based detection and prediction of outliers
    Angiulli, F
    Basta, S
    Pizzuti, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) : 145 - 160
  • [4] A fast and noise resilient cluster-based anomaly detection
    Bigdeli, Elnaz
    Mohammadi, Mahdi
    Raahemi, Bijan
    Matwin, Stan
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2017, 20 (01) : 183 - 199
  • [5] Boukerche A, 2020, ACM COMPUT SURV, V53, DOI [10.1145/3381028, 10.1145/3421763]
  • [6] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [7] LOF: Identifying density-based local outliers
    Breunig, MM
    Kriegel, HP
    Ng, RT
    Sander, J
    [J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
  • [8] Dong YH, 2019, ADV NEUR IN, V32
  • [9] OFCOD: On the Fly Clustering Based Outlier Detection Framework
    Elmogy, Ahmed
    Rizk, Hamada
    Sarhan, Amany M.
    [J]. DATA, 2021, 6 (01) : 1 - 20
  • [10] Eom H. J, 2013, PRIMARY THEORY ELECT, P21