ARRAY: Adaptive triple feature-weighted transfer Naive Bayes for cross-project defect prediction

被引：6

作者：

Tong, Haonan ^{[1
]}

Lu, Wei ^{[1
]}

Xing, Weiwei ^{[1
]}

Wang, Shihai ^{[2
]}

机构：

[1] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China

[2] Beihang Univ, Sch Reliabil & Syst Engn, Sci & Technol Reliabil & Environm Engn Lab, Beijing 100191, Peoples R China

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2023年 / 202卷

关键词：

Cross-project defect prediction; Common metrics; Transfer learning; Feature weighting; Model adaptation; FEATURE-SELECTION; SOFTWARE DEFECTS; MODEL; QUALITY; SUITE;

D O I：

10.1016/j.jss.2023.111721

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Context: Cross-project defect prediction (CPDP) aims to predict defects of target data by using prediction models trained on the source dataset. However, owing to the huge distribution difference, it is still a challenge to build high-performance CPDP models. Objective: We propose a novel high-performance CPDP method named adaptive triple feature-weighted transfer naive Bayes (ARRAY). Methods: ARRAY is characterized by feature weighted similarity, feature weighted instance weight, and the model adaptive adjustment. Experiments are performed on 34 defect datasets. We compare ARRAY with seven state-of-the-art CPDP methods in terms of area under ROC curve (AUC), F1, and Matthews correlation coefficient (MCC) with statistical testing methods. Results: Experimental results show that: (1) on average, ARRAY separately improves MCC, AUC, and F1 over the baselines by at least 18.4%, 6.5%, and 4.5%; (2) ARRAY significantly performs better than each baseline on most datasets; (3) ARRAY significantly outperforms all baselines with non-negligible effect size according to post-hoc test. Conclusion: It can be concluded that: (1) the proposed feature weighted similarity, feature weighted instance weight, and the model adaptive adjustment are very helpful for improving the performance of CPDP models; (2) ARRAY is a more promising alternative for CPDP with common metrics. (c) 2023 Elsevier Inc. All rights reserved.

引用

页数：16

共 74 条

[1] minerva and minepy: a C engine for the MINE suite and its R, Python']Python and MATLAB wrappers
Albanese, Davide
Filosi, Michele
Visintainer, Roberto
Riccadonna, Samantha
Jurman, Giuseppe
Furlanello, Cesare
[J]. BIOINFORMATICS, 2013, 29 (03) : 407 - 408
[2] [Anonymous], 2019, IEEE T SOFTWARE ENG, DOI DOI 10.1109/TSE.2017.2770124
[3] [Anonymous], 1994, Machine Learning: ECML-94, DOI DOI 10.1007/3-540-57868-457
[4] [Anonymous], 1993, C4. 5: Programs for Machine Learning
[5] [Anonymous], 2015, The Promise Repository of Empirical Software Engineering Data 2015
[6] [Anonymous], 2011, P JOINT M EUR SOFTW
[7] A hierarchical model for object-oriented design quality assessment
Bansiya, J
Davis, CG
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (01) : 4 - 17
[8] An Improved Method for Training Data Selection for Cross-Project Defect Prediction
Bhat, Nayeem Ahmad
Farooq, Sheikh Umar
[J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 1939 - 1954
[9] Defect prediction as a multiobjective optimization problem
Canfora, Gerardo
De Lucia, Andrea
Di Penta, Massimiliano
Oliveto, Rocco
Panichella, Annibale
Panichella, Sebastiano
[J]. SOFTWARE TESTING VERIFICATION & RELIABILITY, 2015, 25 (04) : 426 - 459
[10] Multi-Objective Cross-Project Defect Prediction
Canfora, Gerardo
De Lucia, Andrea
Di Penta, Massimiliano
Oliveto, Rocco
Panichella, Annibale
Panichella, Sebastiano
[J]. 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2013), 2013, : 252 - 261

← 1 2 3 4 5 6 7 8 →