Outlier mining based on Variance of Angle technology research in High-Dimensional Data

被引:1
作者
Liu, Wenting [1 ]
Pan, Ruikai [2 ]
机构
[1] Hohai Univ, Coll Comp & Informat, Nanjing, Jiangsu, Peoples R China
[2] Xinhua News Agcy, Xinhua Daily Press Grp, Nanjing, Jiangsu, Peoples R China
来源
2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE) | 2015年
关键词
outlier; high dimensional data; variance;
D O I
10.1109/ISKE.2015.64
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier mining in high dimensional data is currently one of the hot areas of data mining. The existing outlier mining methods are based on the distance in the full-dimensional Euclidean space. In high-dimensional data, these methods are bound to deteriorate due to the notorious dimension disasterwhich leads to distance measure can not express the original physical meaning and the low computational efficiency. This paper improves the method of angle-based outlier factor outlier and proposes the method of variance of angle-based outlier factor outlier. It introduces the related theories to guarantee the reliability of the method. The empirical experiments on synthetic data sets show that the method is efficient and scalable to large high-dimensional data sets.
引用
收藏
页码:598 / 603
页数:6
相关论文
共 50 条
[41]   Feature selection based on geometric distance for high-dimensional data [J].
Lee, J. -H. ;
Oh, S. -Y. .
ELECTRONICS LETTERS, 2016, 52 (06) :473-474
[42]   Robust PCA for high-dimensional data based on characteristic transformation [J].
He, Lingyu ;
Yang, Yanrong ;
Zhang, Bo .
AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS, 2023, 65 (02) :127-151
[43]   Spatial indexing of high-dimensional data based on relative approximation [J].
Sakurai, Y ;
Yoshikawa, M ;
Uemura, S ;
Kojima, H .
VLDB JOURNAL, 2002, 11 (02) :93-108
[44]   Persistent homology based clustering algorithm for high-dimensional data [J].
Xiong Z. ;
Wei Y. ;
Xiong Z. ;
He K. .
Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2024, 52 (02) :29-35
[45]   A Novel Data Purification Algorithm Based On Outlier Mining [J].
Dong, Jianfeng ;
Wang, Xiaofeng ;
Hu, Feng ;
Xiao, Liyan .
HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 3, PROCEEDINGS, 2009, :95-+
[46]   Probabilistic classifiers with high-dimensional data [J].
Kim, Kyung In ;
Simon, Richard .
BIOSTATISTICS, 2011, 12 (03) :399-412
[47]   Visualization for high-dimensional data: VisHD [J].
Yang, CC ;
Chiang, CC ;
Hung, YP ;
Lee, GC .
Ninth International Conference on Information Visualisation, Proceedings, 2005, :692-696
[48]   ASYMPTOTIC INFERENCE FOR HIGH-DIMENSIONAL DATA [J].
Kuelbs, Jim ;
Vidyashankar, Anand N. .
ANNALS OF STATISTICS, 2010, 38 (02) :836-869
[49]   Learning high-dimensional multimedia data [J].
Xiaofeng Zhu ;
Zhi Jin ;
Rongrong Ji .
Multimedia Systems, 2017, 23 :281-283
[50]   Procrustes Analysis for High-Dimensional Data [J].
Andreella, Angela ;
Finos, Livio .
PSYCHOMETRIKA, 2022, 87 (04) :1422-1438