SVDD-based outlier detection on uncertain data

被引:114
作者
Liu, Bo [1 ]
Xiao, Yanshan [2 ]
Cao, Longbing [3 ]
Hao, Zhifeng [2 ]
Deng, Feiqi [4 ]
机构
[1] Guangdong Univ Technol, Fac Automat, Guangzhou, Guangdong, Peoples R China
[2] Guangdong Univ Technol, Fac Comp, Guangzhou, Guangdong, Peoples R China
[3] Univ Technol Sydney, Fac Engn & Informat Technol, Sydney, NSW 2007, Australia
[4] S China Univ Technol, Sch Automat Sci & Engn, Guangzhou, Guangdong, Peoples R China
关键词
Outlier detection; Data of uncertainty; Support vector data description; SUPPORT VECTOR DATA;
D O I
10.1007/s10115-012-0484-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection is an important problem that has been studied within diverse research areas and application domains. Most existing methods are based on the assumption that an example can be exactly categorized as either a normal class or an outlier. However, in many real-life applications, data are uncertain in nature due to various errors or partial completeness. These data uncertainty make the detection of outliers far more difficult than it is from clearly separable data. The key challenge of handling uncertain data in outlier detection is how to reduce the impact of uncertain data on the learned distinctive classifier. This paper proposes a new SVDD-based approach to detect outliers on uncertain data. The proposed approach operates in two steps. In the first step, a pseudo-training set is generated by assigning a confidence score to each input example, which indicates the likelihood of an example tending normal class. In the second step, the generated confidence score is incorporated into the support vector data description training phase to construct a global distinctive classifier for outlier detection. In this phase, the contribution of the examples with the least confidence score on the construction of the decision boundary has been reduced. The experiments show that the proposed approach outperforms state-of-art outlier detection techniques.
引用
收藏
页码:597 / 618
页数:22
相关论文
共 55 条
[1]  
ABRAHAM B, 1979, BIOMETRIKA, V66, P229, DOI 10.1093/biomet/66.2.229
[2]   An empirical Bayes approach to detect anomalies in dynamic multidimensional Arrays [J].
Agarwal, D .
FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, :26-33
[3]   Detecting anomalies in cross-classified streams: a Bayesian approach [J].
Agarwal, Deepak .
KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 11 (01) :29-44
[4]  
Aggarwal C. C., 2008, SDM, P483, DOI 10.1137/1.9781611972788.44
[5]  
Aggarwal C.C., 2007, P 23 IEEE INT C DATA, P866
[6]  
Aggarwal CC, 2009, ADV DATABASE SYST, V35, P1, DOI 10.1007/978-0-387-09690-2
[7]  
Aggarwal CC, 2001, SIGMOD RECORD, V30, P37
[8]   A Survey of Uncertain Data Algorithms and Applications [J].
Aggarwal, Charu C. ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) :609-623
[9]   Generalized radial basis function networks for classification and novelty detection: self-organization of optimal Bayesian decision [J].
Albrecht, S ;
Busch, J ;
Kloppenburg, M ;
Metze, F ;
Tavan, P .
NEURAL NETWORKS, 2000, 13 (10) :1075-1093
[10]  
[Anonymous], P ACM SIGMOD