The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison

被引:96
作者
Xu, Zhou [1 ]
Liu, Jin [1 ]
Yang, Zijiang [2 ]
An, Gege [1 ]
Jia, Xiangyang [1 ]
机构
[1] Wuhan Univ, Sch Comp, State Key Lab Software Engn, Wuhan, Hubei, Peoples R China
[2] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
来源
2016 IEEE 27TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE) | 2016年
关键词
defect prediction; feature selection; Scott-Knott test; SOFTWARE; CLASSIFICATION; INFORMATION; FRAMEWORK; SEARCH;
D O I
10.1109/ISSRE.2016.13
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction aims to determine whether a software module is defect-prone by constructing prediction models. The performance of such models is susceptible to the high dimensionality of the datasets that may include irrelevant and redundant features. Feature selection is applied to alleviate this issue. Because many feature selection methods have been proposed, there is an imperative need to analyze and compare these methods. Prior empirical studies may have potential controversies and limitations, such as the contradictory results, usage of private datasets and inappropriate statistical test techniques. This observation leads us to conduct a careful empirical study to reinforce the confidence of the experimental conclusions by considering several potential source of bias, such as the noise in the dataset and the dataset types. In this paper, we investigate the impact of 32 feature selection methods on the defect prediction performance over two versions of the NASA dataset (i.e., the noisy and clean NASA datasets) and one open source AEEEM dataset. We use a state-of-the-art double Scott-Knott test technique to analyze these methods. Experimental results show that the effectiveness of these feature selection methods on defect prediction performance varies significantly over all the datasets.
引用
收藏
页码:309 / 320
页数:12
相关论文
共 73 条
[1]   A feature selection technique for classificatory analysis [J].
Ahmad, A ;
Dey, L .
PATTERN RECOGNITION LETTERS, 2005, 26 (01) :43-56
[2]  
[Anonymous], P 25 INT C SOFTW ENG
[3]  
[Anonymous], 1993, MORGAN KAUFMANN SERI
[4]  
[Anonymous], INT J APPL INFORM SY
[5]  
[Anonymous], 2007, INTRO CATEGORICAL DA, DOI DOI 10.1002/0470114754
[6]  
[Anonymous], PATTERN RECOGN LETT
[7]  
[Anonymous], 2006, 23 INT C MACH LEARN, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874]
[8]  
[Anonymous], AUTOMAT SOFTW ENG
[9]  
[Anonymous], MATH ED LIB
[10]  
[Anonymous], 2012, ABS12023725 CORR