A Robust Outlier Detection Method in High-Dimensional Data Based on Mutual Information and Principal Component Analysis

被引:1
|
作者
Wang, Hanlin [1 ]
Li, Zhijian [1 ]
机构
[1] BNU HKBU United Int Coll, Guangdong Prov Key Lab Interdisciplinary Res & Ap, Zhuhai 519000, Peoples R China
关键词
Anomaly Detection; Local Outlier Detection; Mutual Information; Principal Component Analysis; High-dimensional Datasets; ANOMALY DETECTION;
D O I
10.1007/978-981-97-5663-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is pivotal in data analysis, particularly with the ever-increasing dimensionality of datasets, which introduces the challenge of the "curse of dimensionality". The traditional Local Outlier Factor (LOF) algorithm, though effective in lower-dimensional spaces, struggles with high-dimensional data. In this paper, we propose an innovative approach, the InfoPrincipal Local Outlier Factor (IP-LOF), which is an enhanced method by integrating Mutual Information and Principal Component Analysis for improved outlier detection in high-dimensional spaces. IP-LOF processes data through dual pathways, applying LOF to subsets identified by these two methods, enabling a nuanced data analysis. Evaluations on synthetic and real-world datasets demonstrate IP-LOF's superior performance over LOF and other benchmark algorithms, particularly in terms of the Area Under the Receiver Operating Characteristic Curve (AUC). Our method illustrates robust adaptability and precision in outlier detection across diverse datasets, addressing the challenges posed by high-dimensional data while ensuring computational efficiency.
引用
收藏
页码:270 / 281
页数:12
相关论文
共 50 条
  • [41] Forecasting High-Dimensional Covariance Matrices Using High-Dimensional Principal Component Analysis
    Shigemoto, Hideto
    Morimoto, Takayuki
    AXIOMS, 2022, 11 (12)
  • [42] Fault Monitoring Method Based on Mutual Information and Relative Principal Component Analysis
    Yang Yinghua
    Pan Yongkang
    Zhang Liping
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 440 - 444
  • [43] Combining random and specific directions for outlier detection and robust estimation in high-dimensional multivariate data
    Pena, Daniel
    Prieto, Francisco J.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (01) : 228 - 254
  • [44] A new density-based subspace selection method using mutual information for high dimensional outlier detection
    Riahi-Madvar, Mahboobeh
    Azirani, Ahmad Akbari
    Nasersharif, Babak
    Raahemi, Bijan
    KNOWLEDGE-BASED SYSTEMS, 2021, 216
  • [45] A new proposal for a principal component-based test for high-dimensional data applied to the analysis of PhyloChip data
    Ding, Guo-Chun
    Smalla, Kornelia
    Heuer, Holger
    Kropf, Siegfried
    BIOMETRICAL JOURNAL, 2012, 54 (01) : 94 - 107
  • [46] ROBOUT: a conditional outlier detection methodology for high-dimensional data
    Farne, Matteo
    Vouldis, Angelos
    STATISTICAL PAPERS, 2024, 65 (04) : 2489 - 2525
  • [47] Outlier Detection Algorithm Based on Robust Component Analysis
    Zheng Cha
    Ji Lixin
    Gao Chao
    Li Shaomei
    Wang Yanchuan
    THIRD INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2018, 10828
  • [48] INCREMENTAL PRINCIPAL COMPONENT ANALYSIS BASED OUTLIER DETECTION METHODS FOR SPATIOTEMPORAL DATA STREAMS
    Bhushan, Alka
    Sharker, Monir H.
    Karimi, Hassan A.
    ISPRS INTERNATIONAL WORKSHOP ON SPATIOTEMPORAL COMPUTING, 2015, : 67 - 71
  • [49] Study on Online Outlier Detection Method based on Principal Component Analysis and Bayesian Classification
    Wang Yalin
    Xie Wenping
    Wang Xiaoli
    Chen Bin
    2013 32ND CHINESE CONTROL CONFERENCE (CCC), 2013, : 7803 - 7808
  • [50] High-dimensional Data Classification Based on Principal Component Analysis Dimension Reduction and Improved BP Algorithm
    Yan, Tai-shan
    Wen, Yi-ting
    Li, Wen-bin
    2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 441 - 445