Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets

被引:20
作者
Alan, Oral [1 ]
Catal, Cagatay [1 ]
机构
[1] Inst Informat Technol, Sci & Technol Res Council Turkey TUBITAK, Natl Res Inst Elect & Cryptol UEKAE, TR-41470 Kocaeli, Turkey
关键词
Outlier detection; Software metrics thresholds; Software fault prediction; Empirical software engineering;
D O I
10.1016/j.eswa.2010.08.130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Predicting the fault-proneness labels of software program modules is an emerging software quality assurance activity and the quality of datasets collected from previous software version affects the performance of fault prediction models. In this paper, we propose an outlier detection approach using metrics thresholds and class labels to identify class outliers. We evaluate our approach on public NASA datasets from PROMISE repository. Experiments reveal that this novel outlier detection method improves the performance of robust software fault prediction models based on Naive Bayes and Random Forests machine learning algorithms. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:3440 / 3445
页数:6
相关论文
共 18 条
[1]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[2]   A systematic review of software fault prediction studies [J].
Catal, Cagatay ;
Diri, Banu .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7346-7354
[3]   Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem [J].
Catal, Cagatay ;
Diri, Banu .
INFORMATION SCIENCES, 2009, 179 (08) :1040-1058
[4]  
Halstead M.H., 1977, Elements of Software Science (Operating and Programming Systems Series
[5]   Discovering cluster-based local outliers [J].
He, ZY ;
Xu, XF ;
Deng, SC .
PATTERN RECOGNITION LETTERS, 2003, 24 (9-10) :1641-1650
[6]   A survey of outlier detection methodologies [J].
Hodge V.J. ;
Austin J. .
Artificial Intelligence Review, 2004, 22 (2) :85-126
[7]   Two-phase clustering process for outliers detection [J].
Jiang, MF ;
Tseng, SS ;
Su, CM .
PATTERN RECOGNITION LETTERS, 2001, 22 (6-7) :691-700
[8]  
Knorr E. M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P392
[9]   Learning to classify e-mail [J].
Koprinska, Irena ;
Poon, Josiah ;
Clark, James ;
Chan, Jason .
INFORMATION SCIENCES, 2007, 177 (10) :2167-2187
[10]  
MA Y, 2006, ADV MACHINE LEARNING, P237