Improve cross-project just-in-time defect prediction with dynamic transfer learning

被引:1
作者
Dai, Hongming [1 ,2 ]
Xi, Jianqing [1 ]
Dai, Hong-Liang [3 ]
机构
[1] South China Univ Technol, Sch Software, Guangzhou 510006, Peoples R China
[2] Guangdong Polytech Sci & Trade, Sch Informat, Guangzhou 510430, Peoples R China
[3] Guangzhou Univ, Sch Econ & Stat, Guangzhou 510006, Peoples R China
关键词
CatBoost; Correlation alignment; Cross-project; Just-in-time defect prediction; Kernel variance matching; MODEL;
D O I
10.1016/j.jss.2024.112214
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cross-project just-in-time software defect prediction (CP-JIT-SDP) is a prominent research topic in the field of software engineering. This approach is characterized by its immediacy, accuracy, real-time feedback, and traceability, enabling it to effectively address the challenges of defect prediction in new projects or projects with limited training data. However, CP-JIT-SDP faces significant challenges due to the differences in the feature distribution between the source and target projects. To address this issue, researchers have proposed methods for adjusting marginal or conditional probability distributions. This study introduces a transfer-learning approach that integrates dynamic distribution adaptation. The kernel variance matching (KVM) method is proposed to adjust the disparity in the marginal probability distribution by recalculating the variance of the source and target projects within the reproducing kernel Hilbert space (RKHS) to minimize the variance disparity. The categorical boosting (CatBoost) algorithm is used to construct models, while the improved CORrelation ALignment (CORAL) method is applied to develop the loss function to address the difference in the conditional probability distribution. This method is abbreviated as KCC, where the symbol K represents KVM, the symbol C represents Cat- Boost, and the next symbol C represents improved CORAL. The KCC method aims to optimize the joint probability distribution of the source project so that it closely agrees with that of the target project through iterative and dynamic integration. Six well-known open-source projects were used to evaluate the effectiveness of the proposed method. The empirical findings indicate that the KCC method exhibited significant improvements over the baseline methods. In particular, the KCC method demonstrated an average increase of 18% in the geometric mean (G-mean), 105.4% in the Matthews correlation coefficient (MCC), 25.6% in the F1-score, and 16.9% in the area under the receiver operating characteristic curve (AUC) when compared to the baseline methods. Furthermore, the KCC method demonstrated greater stability.
引用
收藏
页数:20
相关论文
共 58 条
  • [21] Jelihovschi E., 2015, R. TEMA (Sao Carlos), V15
  • [22] Jiang J, 2007, NLP
  • [23] Techniques for evaluating fault prediction models
    Jiang, Yue
    Cukic, Bojan
    Ma, Yan
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2008, 13 (05) : 561 - 595
  • [24] Studying just-in-time defect prediction using cross-project models
    Kamei, Yasutaka
    Fukushima, Takafumi
    McIntosh, Shane
    Yamashita, Kazuhiro
    Ubayashi, Naoyasu
    Hassan, Ahmed E.
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (05) : 2072 - 2106
  • [25] A Large-Scale Empirical Study of Just-in-Time Quality Assurance
    Kamei, Yasutaka
    Shihab, Emad
    Adams, Bram
    Hassan, Ahmed E.
    Mockus, Audris
    Sinha, Anand
    Ubayashi, Naoyasu
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2013, 39 (06) : 757 - 773
  • [26] Kumar S., 2018, Software fault prediction: a road map
  • [27] Li Yanghao, 2016, arXiv
  • [28] The Impact of Data Merging on the Interpretation of Cross-Project Just-In-Time Defect Models
    Lin, Dayi
    Tantithamthavorn, Chakkrit
    Hassan, Ahmed E.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (08) : 2969 - 2986
  • [29] A two-phase transfer learning model for cross-project defect prediction
    Liu, Chao
    Yang, Dan
    Xia, Xin
    Yan, Meng
    Zhang, Xiaohong
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 107 : 125 - 136
  • [30] Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks
    Liu, Jun
    Wang, Gang
    Duan, Ling-Yu
    Abdiyeva, Kamila
    Kot, Alex C.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) : 1586 - 1599