A Robust Outlier Detection Method in High-Dimensional Data Based on Mutual Information and Principal Component Analysis

被引：1

作者：

Wang, Hanlin ^{[1
]}

Li, Zhijian ^{[1
]}

机构：

[1] BNU HKBU United Int Coll, Guangdong Prov Key Lab Interdisciplinary Res & Ap, Zhuhai 519000, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024 | 2024年 / 14875卷

关键词：

Anomaly Detection; Local Outlier Detection; Mutual Information; Principal Component Analysis; High-dimensional Datasets; ANOMALY DETECTION;

D O I：

10.1007/978-981-97-5663-6_23

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Outlier detection is pivotal in data analysis, particularly with the ever-increasing dimensionality of datasets, which introduces the challenge of the "curse of dimensionality". The traditional Local Outlier Factor (LOF) algorithm, though effective in lower-dimensional spaces, struggles with high-dimensional data. In this paper, we propose an innovative approach, the InfoPrincipal Local Outlier Factor (IP-LOF), which is an enhanced method by integrating Mutual Information and Principal Component Analysis for improved outlier detection in high-dimensional spaces. IP-LOF processes data through dual pathways, applying LOF to subsets identified by these two methods, enabling a nuanced data analysis. Evaluations on synthetic and real-world datasets demonstrate IP-LOF's superior performance over LOF and other benchmark algorithms, particularly in terms of the Area Under the Receiver Operating Characteristic Curve (AUC). Our method illustrates robust adaptability and precision in outlier detection across diverse datasets, addressing the challenges posed by high-dimensional data while ensuring computational efficiency.

引用

页码：270 / 281

页数：12

共 23 条

[1] Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2] Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P1
[3] A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data
Al-amri, Redhwan
Murugesan, Raja Kumar
Man, Mustafa
Abdulateef, Alaa Fareed
Al-Sharafi, Mohammed A.
Alkahtani, Ammar Ahmed
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (12):
[4] Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[5] LOF: Identifying density-based local outliers
Breunig, MM
Kriegel, HP
Ng, RT
Sander, J
[J]. SIGMOD RECORD, 2000, 29 (02) : 93 - 104
[6] Phase I Analysis of Nonlinear Profiles Using Anomaly Detection Techniques
Cheng, Chuen-Sheng
Chen, Pei-Wen
Wu, Yu-Tang
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (04):
[7] Special Issue on Unsupervised Anomaly Detection
Goldstein, Markus
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (10):
[8] THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE
HANLEY, JA
MCNEIL, BJ
[J]. RADIOLOGY, 1982, 143 (01) : 29 - 36
[9] Hinneburg A, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P506
[10] Hybrid Machine Learning-Statistical Method for Anomaly Detection in Flight Data
Jasra, Sameer Kumar
Valentino, Gianluca
Muscat, Alan
Camilleri, Robert
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (20):

← 1 2 3 →