Outlier mining based on Variance of Angle technology research in High-Dimensional Data

被引:1
作者
Liu, Wenting [1 ]
Pan, Ruikai [2 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing, Jiangsu, Peoples R China
[2] Xinhua News Agcy, Xinhua Daily Press Grp, Nanjing, Jiangsu, Peoples R China
来源
2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE) | 2015年
关键词
outlier; high dimensional data; variance;
D O I
10.1109/ISKE.2015.64
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier mining in high dimensional data is currently one of the hot areas of data mining. The existing outlier mining methods are based on the distance in the full-dimensional Euclidean space. In high-dimensional data, these methods are bound to deteriorate due to the notorious dimension disasterwhich leads to distance measure can not express the original physical meaning and the low computational efficiency. This paper improves the method of angle-based outlier factor outlier and proposes the method of variance of angle-based outlier factor outlier. It introduces the related theories to guarantee the reliability of the method. The empirical experiments on synthetic data sets show that the method is efficient and scalable to large high-dimensional data sets.
引用
收藏
页码:598 / 603
页数:6
相关论文
共 50 条
[31]   High-Dimensional Data Visualization Based on User Knowledge [J].
Liu, Qiaolian ;
Zhao, Jianfei ;
Guo, Naiwang ;
Xiao, Ding ;
Shi, Chuan .
DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 :321-329
[32]   Mining high-dimensional data for information fusion: A database-centric approach [J].
Milenova, BL ;
Campos, MM .
2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, :638-645
[33]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[34]   Autoencoder-based outlier detection for sparse, high dimensional data [J].
Chen, Wanghu ;
Li, Huijun ;
Li, Jing ;
Arshad, Ali .
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, :2735-2742
[35]   Outlier-resistant high-dimensional regression modelling based on distribution-free outlier detection and tuning parameter selection [J].
Park, Heewon .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (09) :1799-1812
[36]   An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection [J].
Zhang, Liangwei ;
Lin, Jing ;
Karim, Ramin .
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2015, 142 :482-497
[37]   Review of Existing Research Contribution Toward Dimensional Reduction Methods in High-Dimensional Data [J].
Ambika, P. R. ;
Malakreddy, A. Bharathi .
INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES (ICCNCT 2018), 2019, 15 :409-419
[38]   Missing Data Imputation with High-Dimensional Data [J].
Brini, Alberto ;
van den Heuvel, Edwin R. .
AMERICAN STATISTICIAN, 2024, 78 (02) :240-252
[39]   Robust linear regression for high-dimensional data: An overview [J].
Filzmoser, Peter ;
Nordhausen, Klaus .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2021, 13 (04)
[40]   Model-based clustering of high-dimensional data: A review [J].
Bouveyron, Charles ;
Brunet-Saumard, Camille .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 :52-78