Software defect number prediction: Unsupervised vs supervised methods

被引:78
作者
Chen, Xiang [1 ,2 ,3 ]
Zhang, Dun [1 ]
Zhao, Yingquan [1 ]
Cui, Zhanqi [3 ,4 ]
Ni, Chao [3 ]
机构
[1] Nantong Univ, Sch Comp Sci & Technol, Nantong, Peoples R China
[2] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Guilin, Peoples R China
[3] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[4] Beijing Informat Sci & Technol Univ, Comp Sch, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Software defect prediction; Software defect number prediction; Supervised method; Unsupervised method; Class imbalance learning; Differential evolutionary; Empirical study; OPTIMIZATION; IMBALANCE; METRICS;
D O I
10.1016/j.infsof.2018.10.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Software defect number prediction (SDNP) can rank the program modules according to the prediction results and is helpful for the optimization of testing resource allocation. Objective: In previous studies, supervised methods vs unsupervised methods is an active issue for just-in-time defect prediction and file-level defect prediction based on effort-aware performance measures. However, this issue has not been investigated for SDNP. To the best of our knowledge, we are the first to make a thorough comparison for these two different types of methods. Method: In our empirical studies, we consider 7 real open-source projects with 24 versions in total, use FPA and Kendall as our effort-aware performance measures, and consider three different performance evaluation scenarios (i.e., within-version scenario, cross-version scenario, and cross-project scenario). Result: We first identify two unsupervised methods with best performance. These two methods simply rank modules according to the value of metric LOC and metric RFC from large to small respectively. Then we compare 9 state-of-the-art supervised methods incorporating SMOTEND, which is used for handling class imbalance problem, with the unsupervised method based on LOC metric (i.e., LOC_D method). Final results show that LOC_D method can perform significantly better than or the same as these supervised methods. Later motivated by a recent study conducted by Agrawla and Menzies, we apply differential evolutionary (DE) to optimize parameter value of SMOTEND used by these supervised methods and find that using DE can effectively improve the performance of these supervised methods for SDNP too. Finally, we continue to compare LOC_D with these optimized supervised methods using DE, and LOC_D method still has advantages in the performance, especially in the cross-version and cross-project scenarios. Conclusion: Based on these results, we suggest that researchers need to use the unsupervised method LOC_D as the baseline method, which is used for comparing their proposed novel methods for SDNP problem in the future.
引用
收藏
页码:161 / 181
页数:21
相关论文
共 58 条
[1]   Is "Better Data" Better Than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction [J].
Agrawal, Amritanshu ;
Menzies, Tim .
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :1050-1061
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   Tackling class overlap and imbalance problems in software defect prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
SOFTWARE QUALITY JOURNAL, 2018, 26 (01) :97-125
[6]  
Chen Mingming., 2015, SEKE, P397
[7]   MULTI: Multi-objective effort-aware just-in-time software defect prediction [J].
Chen, Xiang ;
Zhao, Yingquan ;
Wang, Qiuping ;
Yuan, Zhidan .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 93 :1-13
[8]  
Di Martino Sergio, 2011, Product-Focused Software Process Improvement. Proceedings 12th International Conference, PROFES 2011, P247, DOI 10.1007/978-3-642-21843-9_20
[9]  
Drucker H., 1997, P 14 INT C MACH LEAR, P107
[10]  
Fu W., 2016, ARXIV160902613