ARRAY: Adaptive triple feature-weighted transfer Naive Bayes for cross-project defect prediction

被引:6
作者
Tong, Haonan [1 ]
Lu, Wei [1 ]
Xing, Weiwei [1 ]
Wang, Shihai [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Software Engn, Beijing 100044, Peoples R China
[2] Beihang Univ, Sch Reliabil & Syst Engn, Sci & Technol Reliabil & Environm Engn Lab, Beijing 100191, Peoples R China
关键词
Cross-project defect prediction; Common metrics; Transfer learning; Feature weighting; Model adaptation; FEATURE-SELECTION; SOFTWARE DEFECTS; MODEL; QUALITY; SUITE;
D O I
10.1016/j.jss.2023.111721
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Context: Cross-project defect prediction (CPDP) aims to predict defects of target data by using prediction models trained on the source dataset. However, owing to the huge distribution difference, it is still a challenge to build high-performance CPDP models. Objective: We propose a novel high-performance CPDP method named adaptive triple feature-weighted transfer naive Bayes (ARRAY). Methods: ARRAY is characterized by feature weighted similarity, feature weighted instance weight, and the model adaptive adjustment. Experiments are performed on 34 defect datasets. We compare ARRAY with seven state-of-the-art CPDP methods in terms of area under ROC curve (AUC), F1, and Matthews correlation coefficient (MCC) with statistical testing methods. Results: Experimental results show that: (1) on average, ARRAY separately improves MCC, AUC, and F1 over the baselines by at least 18.4%, 6.5%, and 4.5%; (2) ARRAY significantly performs better than each baseline on most datasets; (3) ARRAY significantly outperforms all baselines with non-negligible effect size according to post-hoc test. Conclusion: It can be concluded that: (1) the proposed feature weighted similarity, feature weighted instance weight, and the model adaptive adjustment are very helpful for improving the performance of CPDP models; (2) ARRAY is a more promising alternative for CPDP with common metrics. (c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:16
相关论文
共 74 条
  • [1] minerva and minepy: a C engine for the MINE suite and its R, Python']Python and MATLAB wrappers
    Albanese, Davide
    Filosi, Michele
    Visintainer, Roberto
    Riccadonna, Samantha
    Jurman, Giuseppe
    Furlanello, Cesare
    [J]. BIOINFORMATICS, 2013, 29 (03) : 407 - 408
  • [2] [Anonymous], 2019, IEEE T SOFTWARE ENG, DOI DOI 10.1109/TSE.2017.2770124
  • [3] [Anonymous], 1994, Machine Learning: ECML-94, DOI DOI 10.1007/3-540-57868-457
  • [4] [Anonymous], 1993, C4. 5: Programs for Machine Learning
  • [5] [Anonymous], 2015, The Promise Repository of Empirical Software Engineering Data 2015
  • [6] [Anonymous], 2011, P JOINT M EUR SOFTW
  • [7] A hierarchical model for object-oriented design quality assessment
    Bansiya, J
    Davis, CG
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (01) : 4 - 17
  • [8] An Improved Method for Training Data Selection for Cross-Project Defect Prediction
    Bhat, Nayeem Ahmad
    Farooq, Sheikh Umar
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 1939 - 1954
  • [9] Defect prediction as a multiobjective optimization problem
    Canfora, Gerardo
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    [J]. SOFTWARE TESTING VERIFICATION & RELIABILITY, 2015, 25 (04) : 426 - 459
  • [10] Multi-Objective Cross-Project Defect Prediction
    Canfora, Gerardo
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    [J]. 2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2013), 2013, : 252 - 261