A Robust Outlier Detection Method in High-Dimensional Data Based on Mutual Information and Principal Component Analysis

被引:1
|
作者
Wang, Hanlin [1 ]
Li, Zhijian [1 ]
机构
[1] BNU HKBU United Int Coll, Guangdong Prov Key Lab Interdisciplinary Res & Ap, Zhuhai 519000, Peoples R China
关键词
Anomaly Detection; Local Outlier Detection; Mutual Information; Principal Component Analysis; High-dimensional Datasets; ANOMALY DETECTION;
D O I
10.1007/978-981-97-5663-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is pivotal in data analysis, particularly with the ever-increasing dimensionality of datasets, which introduces the challenge of the "curse of dimensionality". The traditional Local Outlier Factor (LOF) algorithm, though effective in lower-dimensional spaces, struggles with high-dimensional data. In this paper, we propose an innovative approach, the InfoPrincipal Local Outlier Factor (IP-LOF), which is an enhanced method by integrating Mutual Information and Principal Component Analysis for improved outlier detection in high-dimensional spaces. IP-LOF processes data through dual pathways, applying LOF to subsets identified by these two methods, enabling a nuanced data analysis. Evaluations on synthetic and real-world datasets demonstrate IP-LOF's superior performance over LOF and other benchmark algorithms, particularly in terms of the Area Under the Receiver Operating Characteristic Curve (AUC). Our method illustrates robust adaptability and precision in outlier detection across diverse datasets, addressing the challenges posed by high-dimensional data while ensuring computational efficiency.
引用
收藏
页码:270 / 281
页数:12
相关论文
共 50 条
  • [1] Robust principal component analysis and outlier detection with ecological data
    Jackson, DA
    Chen, Y
    ENVIRONMETRICS, 2004, 15 (02) : 129 - 139
  • [2] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Fayomi, Aisha
    Pantazis, Yannis
    Tsagris, Michail
    Wood, Andrew T. A.
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [3] Cauchy robust principal component analysis with applications to high-dimensional data sets
    Aisha Fayomi
    Yannis Pantazis
    Michail Tsagris
    Andrew T. A. Wood
    Statistics and Computing, 2024, 34
  • [4] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [5] High-dimensional robust principal component analysis and its applications
    Jiang, Xiaobo
    Gao, Jie
    Yang, Zhongming
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2023, 23 (05) : 2303 - 2311
  • [6] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [7] An Ensemble Outlier Detection Method Based on Information Entropy-Weighted Subspaces for High-Dimensional Data
    Li, Zihao
    Zhang, Liumei
    ENTROPY, 2023, 25 (08)
  • [8] Multilevel Functional Principal Component Analysis for High-Dimensional Data
    Zipunnikov, Vadim
    Caffo, Brian
    Yousem, David M.
    Davatzikos, Christos
    Schwartz, Brian S.
    Crainiceanu, Ciprian
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (04) : 852 - 873
  • [9] Thresholding-based outlier detection for high-dimensional data
    Yang, Xiaona
    Wang, Zhaojun
    Zi, Xuemin
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (11) : 2170 - 2184
  • [10] Research on Outlier Detection for High-Dimensional Data Based on PPCLOF
    Chen, Chen
    Luo, Kaiwen
    Min, Lan
    Li, Shenglin
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 743 - 758