Two-Stage Cost-Sensitive Learning for Software Defect Prediction

被引:99
作者
Liu, Mingxia [1 ,2 ]
Miao, Linsong [1 ]
Zhang, Daoqiang [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
[2] Taishan Univ, Sch Informat Sci & Technol, Tai An 271021, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Cost-sensitive learning; feature selection; software defect prediction; STATIC CODE ATTRIBUTES; FEATURE-SELECTION; NEURAL-NETWORKS; QUALITY; CLASSIFICATION; METRICS; MODELS; MACHINE; MODULES;
D O I
10.1109/TR.2014.2316951
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction (SDP), which classifies software modules into defect-prone and not-defect-prone categories, provides an effective way to maintain high quality software systems. Most existing SDP models attempt to attain lower classification error rates other than lower misclassification costs. However, in many real-world applications, misclassifying defect-prone modules as not-defect-prone ones usually leads to higher costs than misclassifying not-defect-prone modules as defect-prone ones. In this paper, we first propose a new two-stage cost-sensitive learning (TSCS) method for SDP, by utilizing cost information not only in the classification stage but also in the feature selection stage. Then, specifically for the feature selection stage, we develop three novel cost-sensitive feature selection algorithms, namely, Cost-Sensitive Variance Score (CSVS), Cost-Sensitive Laplacian Score (CSLS), and Cost-Sensitive Constraint Score (CSCS), by incorporating cost information into traditional feature selection algorithms. The proposed methods are evaluated on seven real data sets from NASA projects. Experimental results suggest that our TSCS method achieves better performance in software defect prediction compared to existing single-stage cost-sensitive classifiers. Also, our experiments show that the proposed cost-sensitive feature selection methods outperform traditional cost-blind feature selection methods, validating the efficacy of using cost information in the feature selection stage.
引用
收藏
页码:676 / 686
页数:11
相关论文
共 83 条
[1]  
Abe N., 2004, Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P3
[2]  
[Anonymous], P 9 INT S SOFTW REL
[3]  
[Anonymous], INF SYST FRONT
[4]  
[Anonymous], 2003, Statistical pattern recognition
[5]  
[Anonymous], 2004, METRICS DATA PROGRAM
[6]  
[Anonymous], COMPUTATIONAL INTELL
[7]  
[Anonymous], 2004, P WORKSH PRED SOFTW
[8]  
[Anonymous], INT JOINT C ART INT
[9]  
[Anonymous], 2005, ADV NEURAL INFORM PR
[10]  
[Anonymous], P 15 INT S SOFTW REL