An Empirical Study on the Effectiveness of Feature Selection for Cross-Project Defect Prediction

被引:34
作者
Yu, Qiao [1 ]
Qian, Junyan [2 ]
Jiang, Shujuan [3 ,4 ]
Wu, Zhenhua [5 ]
Zhang, Gongjie [1 ]
机构
[1] Jiangsu Normal Univ, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[2] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Guilin 541004, Peoples R China
[3] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Jiangsu, Peoples R China
[4] Minist Educ, Engn Res Ctr Mine Digitalizat, Xuzhou 221116, Jiangsu, Peoples R China
[5] China Univ Min & Technol, Sch Environm Sci & Spatial Informat, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Software defect prediction; cross-project defect prediction; feature selection; feature ranking; METRICS;
D O I
10.1109/ACCESS.2019.2895614
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction has attracted much attention of researchers in software engineering. At present, feature selection approaches have been introduced into software defect prediction, which can improve the performance of traditional defect prediction (known as within-project defect prediction, WPDP) effectively. However, the studies on feature selection are not sufficient for cross-project defect prediction (CPDP). In this paper, we use the feature subset selection and feature ranking approaches to explore the effectiveness of feature selection for CPDP. An empirical study is conducted on NASA and PROMISE datasets. The results show that both the feature subset selection and feature ranking approaches can improve the performance of CPDP. Therefore, we should select the representative feature subset or set a reasonable proportion of selected features to improve the performance of CPDP in future studies.
引用
收藏
页码:35710 / 35718
页数:9
相关论文
共 41 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], PROC INT CONF SOFTW
[3]  
[Anonymous], PROC INT CONF SOFTW
[4]  
[Anonymous], 2018, IEEE T SOFTWARE ENG, DOI DOI 10.1109/TSE.2017.2724538
[5]   Assessing the applicability of fault-proneness models across object-oriented software projects [J].
Briand, LC ;
Melo, WL ;
Wüst, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720
[6]  
[陈翔 Chen Xiang], 2018, [计算机学报, Chinese Journal of Computers], V41, P254
[7]  
Cohen JW., 1988, STAT POWER ANAL BEHA, DOI 10.4324/9780203771587
[8]   Choosing software metrics for defect prediction: an investigation on feature selection techniques [J].
Gao, Kehan ;
Khoshgoftaar, Taghi M. ;
Wang, Huanjing ;
Seliya, Naeem .
SOFTWARE-PRACTICE & EXPERIENCE, 2011, 41 (05) :579-606
[9]  
Guyon I., 2020, J MACH LEARN RES, V3, P1157, DOI [DOI 10.1162/153244303322753616, 10.1162/153244303322753616]
[10]  
Hall M.A., 1999, P 17 INT C MACHINE L, P359