A Robust Outlier Detection Method in High-Dimensional Data Based on Mutual Information and Principal Component Analysis

被引:1
|
作者
Wang, Hanlin [1 ]
Li, Zhijian [1 ]
机构
[1] BNU HKBU United Int Coll, Guangdong Prov Key Lab Interdisciplinary Res & Ap, Zhuhai 519000, Peoples R China
关键词
Anomaly Detection; Local Outlier Detection; Mutual Information; Principal Component Analysis; High-dimensional Datasets; ANOMALY DETECTION;
D O I
10.1007/978-981-97-5663-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is pivotal in data analysis, particularly with the ever-increasing dimensionality of datasets, which introduces the challenge of the "curse of dimensionality". The traditional Local Outlier Factor (LOF) algorithm, though effective in lower-dimensional spaces, struggles with high-dimensional data. In this paper, we propose an innovative approach, the InfoPrincipal Local Outlier Factor (IP-LOF), which is an enhanced method by integrating Mutual Information and Principal Component Analysis for improved outlier detection in high-dimensional spaces. IP-LOF processes data through dual pathways, applying LOF to subsets identified by these two methods, enabling a nuanced data analysis. Evaluations on synthetic and real-world datasets demonstrate IP-LOF's superior performance over LOF and other benchmark algorithms, particularly in terms of the Area Under the Receiver Operating Characteristic Curve (AUC). Our method illustrates robust adaptability and precision in outlier detection across diverse datasets, addressing the challenges posed by high-dimensional data while ensuring computational efficiency.
引用
收藏
页码:270 / 281
页数:12
相关论文
共 50 条
  • [31] A Robust Principal Component Analysis for Outlier Identification in Messy Microcalorimeter Data
    Fowler, J. W.
    Alpert, B. K.
    Joe, Y-I
    O'Neil, G. C.
    Swetz, D. S.
    Ullom, J. N.
    JOURNAL OF LOW TEMPERATURE PHYSICS, 2020, 199 (3-4) : 745 - 753
  • [32] A Robust Principal Component Analysis for Outlier Identification in Messy Microcalorimeter Data
    J. W. Fowler
    B. K. Alpert
    Y.-I. Joe
    G. C. O’Neil
    D. S. Swetz
    J. N. Ullom
    Journal of Low Temperature Physics, 2020, 199 : 745 - 753
  • [33] A Feature Subset Selection Method Based On High-Dimensional Mutual Information
    Zheng, Yun
    Kwoh, Chee Keong
    ENTROPY, 2011, 13 (04) : 860 - 901
  • [34] Exploring high-dimensional biological data with sparse contrastive principal component analysis
    Boileau, Philippe
    Hejazi, Nima S.
    Dudoit, Sandrine
    BIOINFORMATICS, 2020, 36 (11) : 3422 - 3430
  • [35] Adaptive local Principal Component Analysis improves the clustering of high-dimensional data
    Migenda, Nico
    Moeller, Ralf
    Schenck, Wolfram
    PATTERN RECOGNITION, 2024, 146
  • [36] High-dimensional principal component analysis with heterogeneous missingness
    Zhu, Ziwei
    Wang, Tengyao
    Samworth, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (05) : 2000 - 2031
  • [37] Test for high-dimensional outliers with principal component analysis
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2024, 7 (02) : 739 - 766
  • [38] PRINCIPAL COMPONENT ANALYSIS IN VERY HIGH-DIMENSIONAL SPACES
    Lee, Young Kyung
    Lee, Eun Ryung
    Park, Byeong U.
    STATISTICA SINICA, 2012, 22 (03) : 933 - 956
  • [39] An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    Assent, Ira
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 138 - +
  • [40] High-dimensional data stream outlier detection algorithm based on angle distribution
    Lu, S. (lusheng@cqupt.edu.cn), 1600, Shanghai Jiaotong University (48):