Two-Stage Cost-Sensitive Learning for Software Defect Prediction

被引:99
作者
Liu, Mingxia [1 ,2 ]
Miao, Linsong [1 ]
Zhang, Daoqiang [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
[2] Taishan Univ, Sch Informat Sci & Technol, Tai An 271021, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Cost-sensitive learning; feature selection; software defect prediction; STATIC CODE ATTRIBUTES; FEATURE-SELECTION; NEURAL-NETWORKS; QUALITY; CLASSIFICATION; METRICS; MODELS; MACHINE; MODULES;
D O I
10.1109/TR.2014.2316951
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction (SDP), which classifies software modules into defect-prone and not-defect-prone categories, provides an effective way to maintain high quality software systems. Most existing SDP models attempt to attain lower classification error rates other than lower misclassification costs. However, in many real-world applications, misclassifying defect-prone modules as not-defect-prone ones usually leads to higher costs than misclassifying not-defect-prone modules as defect-prone ones. In this paper, we first propose a new two-stage cost-sensitive learning (TSCS) method for SDP, by utilizing cost information not only in the classification stage but also in the feature selection stage. Then, specifically for the feature selection stage, we develop three novel cost-sensitive feature selection algorithms, namely, Cost-Sensitive Variance Score (CSVS), Cost-Sensitive Laplacian Score (CSLS), and Cost-Sensitive Constraint Score (CSCS), by incorporating cost information into traditional feature selection algorithms. The proposed methods are evaluated on seven real data sets from NASA projects. Experimental results suggest that our TSCS method achieves better performance in software defect prediction compared to existing single-stage cost-sensitive classifiers. Also, our experiments show that the proposed cost-sensitive feature selection methods outperform traditional cost-blind feature selection methods, validating the efficacy of using cost information in the feature selection stage.
引用
收藏
页码:676 / 686
页数:11
相关论文
共 83 条
[61]   Detecting fault modules applying feature selection to classifiers [J].
Rodriguez, D. ;
Ruiz, R. ;
Cuadrado-Gallego, J. ;
Aguilar-Ruiz, J. .
IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2007, :667-+
[62]   An experimental evaluation of an experience-based capture-recapture method in software code inspections [J].
Runeson P. ;
Wohlin C. .
Empirical Software Engineering, 1998, 3 (4) :381-406
[63]  
Seiffert Chris, 2008, 2008 IEEE International Conference on Data Mining Workshops, P46, DOI 10.1109/ICDMW.2008.119
[64]   Improving Software-Quality Predictions With Data Sampling and Boosting [J].
Seiffert, Chris ;
Khoshgoftaar, Taghi M. ;
Van Hulse, Jason .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (06) :1283-1294
[65]   Software quality analysis of unlabeled program modules with semisupervised clustering [J].
Seliya, Naeem ;
Khoshgoftaar, Taghi M. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (02) :201-211
[66]   Software defect association mining and defect correction effort prediction [J].
Song, QB ;
Shepperd, M ;
Cartwright, M ;
Mair, C .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2006, 32 (02) :69-82
[67]   A General Software Defect-Proneness Prediction Framework [J].
Song, Qinbao ;
Jia, Zihan ;
Shepperd, Martin ;
Ying, Shi ;
Liu, Jin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2011, 37 (03) :356-370
[68]   Bagging Constraint Score for feature selection with pairwise constraints [J].
Sun, Dan ;
Zhang, Daoqiang .
PATTERN RECOGNITION, 2010, 43 (06) :2106-2118
[69]   Cost-sensitive boosting for classification of imbalanced data [J].
Sun, Yamnin ;
Kamel, Mohamed S. ;
Wong, Andrew K. C. ;
Wang, Yang .
PATTERN RECOGNITION, 2007, 40 (12) :3358-3378
[70]   Using Coding-Based Ensemble Learning to Improve Software Defect Prediction [J].
Sun, Zhongbin ;
Song, Qinbao ;
Zhu, Xiaoyan .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06) :1806-1817