Comparing Hyperparameter Optimization in Cross- and Within-Project Defect Prediction: A Case Study

被引:0
作者
Muhammed Maruf Öztürk
机构
[1] Suleyman Demirel University,Department of Computer Engineering, Faculty of Engineering
来源
Arabian Journal for Science and Engineering | 2019年 / 44卷
关键词
Defect prediction; Hyperparameter optimization; Cross-project defect prediction; Random forest; SVM;
D O I
暂无
中图分类号
学科分类号
摘要
Various studies related to the cross-project defect prediction (CPDP) have been done in defect prediction literature. These studies are based on the methodology which takes training and testing data sets from different projects or varied versions of same project that could have same number of features. Configurable parameters of machine learning algorithms should not be disregarded during defect prediction. In this study, the effects of hyperparameter optimization are investigated in CPDP and within-project defect prediction (WPDP). To this end, this work proposes a novel method that shows how hyperparameter optimization should be performed in CPDP. Thus, two new procedures are proposed by regarding the structure of heterogeneous data sets. Firstly, a defect prediction model is established on 20 data sets. Various hyperparameters are optimized and the success of CPDP and WPDP is compared afterward. According to the obtained results: (i) CPDP is averagely superior to WPDP in hyperparameter optimization; (ii) linear kernel of SVM is better than polynomial and radial kernels in terms of hyperparameter optimization; (iii) max tree depth (interaction.depth) is crucial in increasing accuracy if a tree-based algorithm is used.
引用
收藏
页码:3515 / 3530
页数:15
相关论文
共 87 条
[1]  
Shen VY(1985)Identifying error-prone softwarean empirical study IEEE Trans. Softw. Eng. 4 317-324
[2]  
Yu TJ(2018)Heterogeneous defect prediction IEEE Trans. Softw. Eng. 44 874-896
[3]  
Thebaut SM(2018)MAHAKIL: diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction IEEE Trans. Softw. Eng. 44 534-550
[4]  
Paulsen LR(2017)An empirical comparison of model validation techniques for defect prediction models IEEE Trans. Softw. Eng. 43 1-18
[5]  
Nam J(2017)An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems IEEE Trans. Softw. Eng. 43 321-339
[6]  
Fu W(2017)A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction Inf. Softw. Technol. 95 296-312
[7]  
Kim S(2016)Value-cognitive boosting with a support vector machine for cross-project defect prediction Empir. Softw. Eng. 21 43-71
[8]  
Menzies T(1999)A critique of software defect prediction models IEEE Trans. Softw. Eng. 25 675-689
[9]  
Tan L(2007)Data mining static code attributes to learn defect predictors IEEE Trans. Softw. Eng. 33 2-13
[10]  
Ebo Bennin K(2013)Balancing privacy and utility in cross-company defect prediction IEEE Trans. Softw. Eng. 39 1054-1068