A Robust Outlier Detection Method in High-Dimensional Data Based on Mutual Information and Principal Component Analysis

被引:1
|
作者
Wang, Hanlin [1 ]
Li, Zhijian [1 ]
机构
[1] BNU HKBU United Int Coll, Guangdong Prov Key Lab Interdisciplinary Res & Ap, Zhuhai 519000, Peoples R China
关键词
Anomaly Detection; Local Outlier Detection; Mutual Information; Principal Component Analysis; High-dimensional Datasets; ANOMALY DETECTION;
D O I
10.1007/978-981-97-5663-6_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is pivotal in data analysis, particularly with the ever-increasing dimensionality of datasets, which introduces the challenge of the "curse of dimensionality". The traditional Local Outlier Factor (LOF) algorithm, though effective in lower-dimensional spaces, struggles with high-dimensional data. In this paper, we propose an innovative approach, the InfoPrincipal Local Outlier Factor (IP-LOF), which is an enhanced method by integrating Mutual Information and Principal Component Analysis for improved outlier detection in high-dimensional spaces. IP-LOF processes data through dual pathways, applying LOF to subsets identified by these two methods, enabling a nuanced data analysis. Evaluations on synthetic and real-world datasets demonstrate IP-LOF's superior performance over LOF and other benchmark algorithms, particularly in terms of the Area Under the Receiver Operating Characteristic Curve (AUC). Our method illustrates robust adaptability and precision in outlier detection across diverse datasets, addressing the challenges posed by high-dimensional data while ensuring computational efficiency.
引用
收藏
页码:270 / 281
页数:12
相关论文
共 50 条
  • [21] High-dimensional covariance forecasting based on principal component analysis of high-frequency data
    Jian, Zhihong
    Deng, Pingjun
    Zhu, Zhican
    ECONOMIC MODELLING, 2018, 75 : 422 - 431
  • [22] On principal component analysis for high-dimensional XCSR
    Behdad, Mohammad
    French, Tim
    Barone, Luigi
    Bennamoun, Mohammed
    EVOLUTIONARY INTELLIGENCE, 2012, 5 (02) : 129 - 138
  • [23] OUTLIER DETECTION BASED ON DENSITY OF HYPERCUBE IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Zou, Fengbo
    Li, Simin
    Lu, Xianying
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 873 - 889
  • [24] Variational autoencoder-based outlier detection for high-dimensional data
    Li, Yongmou
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002
  • [25] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
  • [26] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
  • [27] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xiaodan Xu
    Huawen Liu
    Li Li
    Minghai Yao
    International Journal of Computational Intelligence Systems, 2018, 11 : 652 - 662
  • [28] Robust principal component analysis for accurate outlier sample detection in RNA-Seq data
    Xiaoying Chen
    Bo Zhang
    Ting Wang
    Azad Bonni
    Guoyan Zhao
    BMC Bioinformatics, 21
  • [29] OUTLIER DETECTION WITH ENHANCED ANGLE-BASED OUTLIER FACTOR IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Tian, Hao
    Li, Simin
    Zou, Fengbo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1633 - 1651
  • [30] Robust principal component analysis for accurate outlier sample detection in RNA-Seq data
    Chen, Xiaoying
    Zhang, Bo
    Wang, Ting
    Bonni, Azad
    Zhao, Guoyan
    BMC BIOINFORMATICS, 2020, 21 (01)