A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

被引:84
作者
Song, Hongchao [1 ]
Jiang, Zhuqing [1 ]
Men, Aidong [1 ]
Yang, Bo [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Informat & Telecommun Engn Coll, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
SUPPORT;
D O I
10.1155/2017/8501683
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble kappa-nearest neighbor graphs-(K-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.
引用
收藏
页数:9
相关论文
共 41 条
[31]  
Singh B, 2013, 2013 4TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT), P121, DOI 10.1109/ICCCT.2013.6749614
[32]  
Suzuki E., P 3 IEEE INT C DAT M, P315
[33]   Support vector data description [J].
Tax, DMJ ;
Duin, RPW .
MACHINE LEARNING, 2004, 54 (01) :45-66
[34]   Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement [J].
Wulsin, D. F. ;
Gupta, J. R. ;
Mani, R. ;
Blanco, J. A. ;
Litt, B. .
JOURNAL OF NEURAL ENGINEERING, 2011, 8 (03)
[35]   Semi-supervised outlier detection based on fuzzy rough C-means clustering [J].
Xue, Zhenxia ;
Shang, Youlin ;
Feng, Aifen .
MATHEMATICS AND COMPUTERS IN SIMULATION, 2010, 80 (09) :1911-1921
[36]   Clustering-based adaptive crossover and mutation probabilities for genetic algorithms [J].
Zhang, Jun ;
Chung, Henry Shu-Hung ;
Lo, Wai-Lun .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2007, 11 (03) :326-335
[37]  
Zhang S., 2011, APPL INTELL, V35, P1
[38]  
Zhang S., 2010, IEEE Intell. Informatics Bull., P24
[39]  
Zhao M., 2009, Advances in Neural Information Processing Systems, P2250
[40]   A survey on unsupervised outlier detection in high-dimensional numerical data [J].
Zimek, Arthur ;
Schubert, Erich ;
Kriegel, Hans-Peter .
Statistical Analysis and Data Mining, 2012, 5 (05) :363-387