Improve cross-project just-in-time defect prediction with dynamic transfer learning

被引:1
作者
Dai, Hongming [1 ,2 ]
Xi, Jianqing [1 ]
Dai, Hong-Liang [3 ]
机构
[1] South China Univ Technol, Sch Software, Guangzhou 510006, Peoples R China
[2] Guangdong Polytech Sci & Trade, Sch Informat, Guangzhou 510430, Peoples R China
[3] Guangzhou Univ, Sch Econ & Stat, Guangzhou 510006, Peoples R China
关键词
CatBoost; Correlation alignment; Cross-project; Just-in-time defect prediction; Kernel variance matching; MODEL;
D O I
10.1016/j.jss.2024.112214
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cross-project just-in-time software defect prediction (CP-JIT-SDP) is a prominent research topic in the field of software engineering. This approach is characterized by its immediacy, accuracy, real-time feedback, and traceability, enabling it to effectively address the challenges of defect prediction in new projects or projects with limited training data. However, CP-JIT-SDP faces significant challenges due to the differences in the feature distribution between the source and target projects. To address this issue, researchers have proposed methods for adjusting marginal or conditional probability distributions. This study introduces a transfer-learning approach that integrates dynamic distribution adaptation. The kernel variance matching (KVM) method is proposed to adjust the disparity in the marginal probability distribution by recalculating the variance of the source and target projects within the reproducing kernel Hilbert space (RKHS) to minimize the variance disparity. The categorical boosting (CatBoost) algorithm is used to construct models, while the improved CORrelation ALignment (CORAL) method is applied to develop the loss function to address the difference in the conditional probability distribution. This method is abbreviated as KCC, where the symbol K represents KVM, the symbol C represents Cat- Boost, and the next symbol C represents improved CORAL. The KCC method aims to optimize the joint probability distribution of the source project so that it closely agrees with that of the target project through iterative and dynamic integration. Six well-known open-source projects were used to evaluate the effectiveness of the proposed method. The empirical findings indicate that the KCC method exhibited significant improvements over the baseline methods. In particular, the KCC method demonstrated an average increase of 18% in the geometric mean (G-mean), 105.4% in the Matthews correlation coefficient (MCC), 25.6% in the F1-score, and 16.9% in the area under the receiver operating characteristic curve (AUC) when compared to the baseline methods. Furthermore, the KCC method demonstrated greater stability.
引用
收藏
页数:20
相关论文
共 58 条
  • [1] Ali A.H., 2023, Mesopotamian J. Big Data, V2023, P29, DOI [10.58496/MJBD/2023/004, DOI 10.58496/MJBD/2023/004]
  • [2] Just-in-time software defect prediction using deep temporal convolutional networks
    Ardimento, Pasquale
    Aversano, Lerina
    Bernardi, Mario Luca
    Cimitile, Marta
    Iammarino, Martina
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (05) : 3981 - 4001
  • [3] Examining the performance of kernel methods for software defect prediction based on support vector machine
    Azzeh, Mohammad
    Elsheikh, Yousef
    Nassif, Ali Bou
    Angelis, Lefteris
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2023, 226
  • [4] Bergstra J. S., 2011, Adv. Neural Inf. Process. Syst.
  • [5] Bergstra J, 2012, J MACH LEARN RES, V13, P281
  • [6] Towards Reliable Online Just-in-Time Software Defect Prediction
    Cabral, George G.
    Minku, Leandro L.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (03) : 1342 - 1358
  • [7] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [8] MULTI: Multi-objective effort-aware just-in-time software defect prediction
    Chen, Xiang
    Zhao, Yingquan
    Wang, Qiuping
    Yuan, Zhidan
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 93 : 1 - 13
  • [9] Craven M, 2004, Markov networks for detecting overlapping elements in sequence data
  • [10] A Framework for Evaluating the Results of the SZZ Approach for Identifying Bug-Introducing Changes
    da Costa, Daniel Alencar
    McIntosh, Shane
    Shang, Weiyi
    Kulesza, Uira
    Coelho, Roberta
    Hassan, Ahmed E.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2017, 43 (07) : 641 - 657