A three-stage transfer learning framework for multi-source cross-project software defect prediction

被引:24
作者
Bai, Jiaojiao [1 ]
Jia, Jingdong [1 ]
Capretz, Luiz Fernando [2 ]
机构
[1] Beihang Univ, Sch Software, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[2] Western Univ, Elect & Comp Engn, London, ON, Canada
关键词
Transfer learning; Cross-project defect prediction; Source selection; Multi-source utilization; 3SW-MSTL; SUPPORT VECTOR MACHINE; MODELS;
D O I
10.1016/j.infsof.2022.106985
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Transfer learning techniques have been proved to be effective in the field of Cross-project defect prediction (CPDP). However, some questions still remain. First, the conditional distribution difference between source and target projects has not been considered. Second, facing multiple source projects, most studies only rarely consider the issues of source selection and multi-source data utilization; instead, they use all available projects and merge multi-source data together to obtain one final dataset. Objective: To address these issues, in this paper, we propose a three-stage weighting framework for multi-source transfer learning (3SW-MSTL) in CPDP. In stage 1, a source selection strategy is needed to select a suitable number of source projects from all available projects. In stage 2, a transfer technique is applied to minimize marginal differences. In stage 3, a multi-source data utilization scheme that uses conditional distribution information is needed to help guide researchers in the use of multi-source transferred data. Method: First, we have designed five source selection strategies and four multi-source utilization schemes and chosen the best one to be used in stage 1 and 3 in 3SW-MSTL by comparing their influences on prediction performance. Second, to validate the performance of 3SW-MSTL, we compared it with four multi-source and six single-source CPDP methods, a baseline within-project defect prediction (WPDP) method, and two unsupervised methods on the data from 30 widely used open-source projects. Results: Through experiments, bellwether and weighted vote are separately chosen as a source selection strategy and a multi-source utilization scheme used in 3SW-MSTL. And, our results indicate that 3SW-MSTL outperforms four multi-source, six single-source CPDP methods and two unsupervised methods. And, 3SW-MSTL is comparable to the WPDP method. Conclusion: The proposed 3SW-MSTL model is more effective for considering the two issues mentioned before.
引用
收藏
页数:16
相关论文
共 69 条
[1]   Cross-version defect prediction: use historical data, cross-project data, or both? [J].
Amasaki, Sousuke .
EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (02) :1573-1595
[2]  
[Anonymous], 2019, IEEE T SOFTWARE ENG, DOI DOI 10.1109/TSE.2017.2770124
[3]  
[Anonymous], 2003, Journal of the American Statistical Association, DOI [DOI 10.1198/JASA.2003.S270, 10.1198/jasa.2003.s269]
[4]   Assessing the applicability of fault-proneness models across object-oriented software projects [J].
Briand, LC ;
Melo, WL ;
Wüst, J .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) :706-720
[5]   Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem [J].
Catal, Cagatay ;
Diri, Banu .
INFORMATION SCIENCES, 2009, 179 (08) :1040-1058
[6]   Collective transfer learning for defect prediction [J].
Chen, Jinyin ;
Hu, Keke ;
Yang, Yitao ;
Liu, Yi ;
Xuan, Qi .
NEUROCOMPUTING, 2020, 416 :103-116
[7]   MULTI: Multi-objective effort-aware just-in-time software defect prediction [J].
Chen, Xiang ;
Zhao, Yingquan ;
Wang, Qiuping ;
Yuan, Zhidan .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 93 :1-13
[8]   Research on Cross - project Software Defect Prediction Based on Transfer Learning [J].
Chen, Ya ;
Ding, Xiaoming .
ADVANCES IN MATERIALS, MACHINERY, ELECTRONICS II, 2018, 1955
[9]   DOMINANCE STATISTICS - ORDINAL ANALYSES TO ANSWER ORDINAL QUESTIONS [J].
CLIFF, N .
PSYCHOLOGICAL BULLETIN, 1993, 114 (03) :494-509
[10]  
D'Ambros Marco, 2010, Proceedings of the 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), P31, DOI 10.1109/MSR.2010.5463279