A Robust Outlier Detection Method in High-Dimensional Data Based on Mutual Information and Principal Component Analysis

被引:1
作者
Wang, Hanlin [1 ]
Li, Zhijian [1 ]
机构
[1] BNU HKBU United Int Coll, Guangdong Prov Key Lab Interdisciplinary Res & Ap, Zhuhai 519000, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024 | 2024年 / 14875卷
关键词
Anomaly Detection; Local Outlier Detection; Mutual Information; Principal Component Analysis; High-dimensional Datasets; ANOMALY DETECTION;
D O I
10.1007/978-981-97-5663-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is pivotal in data analysis, particularly with the ever-increasing dimensionality of datasets, which introduces the challenge of the "curse of dimensionality". The traditional Local Outlier Factor (LOF) algorithm, though effective in lower-dimensional spaces, struggles with high-dimensional data. In this paper, we propose an innovative approach, the InfoPrincipal Local Outlier Factor (IP-LOF), which is an enhanced method by integrating Mutual Information and Principal Component Analysis for improved outlier detection in high-dimensional spaces. IP-LOF processes data through dual pathways, applying LOF to subsets identified by these two methods, enabling a nuanced data analysis. Evaluations on synthetic and real-world datasets demonstrate IP-LOF's superior performance over LOF and other benchmark algorithms, particularly in terms of the Area Under the Receiver Operating Characteristic Curve (AUC). Our method illustrates robust adaptability and precision in outlier detection across diverse datasets, addressing the challenges posed by high-dimensional data while ensuring computational efficiency.
引用
收藏
页码:270 / 281
页数:12
相关论文
共 23 条
  • [1] Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
  • [2] Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
  • [3] A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data
    Al-amri, Redhwan
    Murugesan, Raja Kumar
    Man, Mustafa
    Abdulateef, Alaa Fareed
    Al-Sharafi, Mohammed A.
    Alkahtani, Ammar Ahmed
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (12):
  • [4] Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
  • [5] LOF: Identifying density-based local outliers
    Breunig, MM
    Kriegel, HP
    Ng, RT
    Sander, J
    [J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
  • [6] Phase I Analysis of Nonlinear Profiles Using Anomaly Detection Techniques
    Cheng, Chuen-Sheng
    Chen, Pei-Wen
    Wu, Yu-Tang
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [7] Special Issue on Unsupervised Anomaly Detection
    Goldstein, Markus
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [8] THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE
    HANLEY, JA
    MCNEIL, BJ
    [J]. RADIOLOGY, 1982, 143 (01) : 29 - 36
  • [9] Hinneburg A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P506
  • [10] Hybrid Machine Learning-Statistical Method for Anomaly Detection in Flight Data
    Jasra, Sameer Kumar
    Valentino, Gianluca
    Muscat, Alan
    Camilleri, Robert
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (20):