Just-in-time software defect prediction method for non-stationary and imbalanced data streams

被引:0
|
作者
Wu, Qikai [1 ]
Wang, Xingqi [1 ,2 ]
Wei, Dan [1 ,2 ]
Chen, Bin [1 ,2 ]
Dang, Qingguo [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Peoples R China
[2] Key Lab Discrete Ind Internet Things Zhejiang Prov, Hangzhou, Peoples R China
关键词
Just-in-time software defect prediction; Online learning; Concept drift; Verification latency; Class imbalance learning;
D O I
10.1007/s11219-025-09711-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Compared to traditional software defect prediction, Just-In-Time Software Defect Prediction (JIT-SDP) is a more fine-grained software defect prediction method used for defect prediction at the software change level. However, JIT software defect datasets in online data stream scenarios suffer from issues like validation delay, concept drift, and class imbalance evolution, which severely impact the predictive performance of JIT-SDP. This paper introduces a just-in-time software defect prediction method for non-stationary and imbalanced data streams, JNAI (JIT-SDP method for Non-stationary And Imbalanced data streams). This method solves validation delays, concept drifts, and class imbalance issues in existing JIT software defect processing technology. It proposes a validation delay framework to correct data labels, and a concept drift adaptation mechanism that combines intra-project and cross-project data filtering to mitigate concept drift while avoiding prediction bias caused by cross-project data. Next, a dynamic classifier selection method integrating a tiered AdaBoost is designed, using classifiers trained on preceding data to predict subsequent data labels iteratively, thereby addressing the issue of class distribution imbalance in data streams. Finally, the Hoeffding Tree is selected as the base classifier, and the processed dataset is used to train it, forming the final model of the just-in-time software defect prediction method. Experiments were conducted on six public JIT-SDP datasets and ten open-source GitHub projects, and the results show that JNAI effectively improves the predictive performance of just-in-time software defect prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Mobile Application Online Cross-Project Just-in-Time Software Defect Prediction Framework
    Jiang, Siyu
    He, Zhenhang
    chen, Yuwen
    Zhang, Mingrong
    Ma, Le
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (06)
  • [22] Learning with ensembles from non-stationary data streams
    Verdecia-Cabrera, Alberto
    Frias-Blanco, Isvani
    Quintero-Dominguez, Luis
    Sarabia, Yanet Rodriguez
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2018, 21 (62): : 145 - 158
  • [23] Scarcity of Labels in Non-Stationary Data Streams: A Survey
    Fahy, Conor
    Yang, Shengxiang
    Gongora, Mario
    ACM COMPUTING SURVEYS, 2023, 55 (02)
  • [24] Learning with ensembles from non-stationary data streams
    Verdecia-Cabrera A.
    Frías-Blanco I.
    Quintero-Domínguez L.
    Sarabia Y.R.
    2018, Asociacion Espanola de Inteligencia Artificial (21) : 145 - 158
  • [25] An online adaptive classifier ensemble for mining non-stationary data streams
    Verdecia-Cabrera, Alberto
    Blanco, Isvani Frias
    Carvalho, Andre C. P. L. F.
    INTELLIGENT DATA ANALYSIS, 2018, 22 (04) : 787 - 806
  • [26] Online neural network model for non-stationary and imbalanced data stream classification
    Ghazikhani, Adel
    Monsefi, Reza
    Yazdi, Hadi Sadoghi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2014, 5 (01) : 51 - 62
  • [27] Online neural network model for non-stationary and imbalanced data stream classification
    Adel Ghazikhani
    Reza Monsefi
    Hadi Sadoghi Yazdi
    International Journal of Machine Learning and Cybernetics, 2014, 5 : 51 - 62
  • [28] FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams
    Park, Namuk
    Kim, Songkuk
    SENSORS, 2021, 21 (04) : 1 - 19
  • [29] Preserving Differential Privacy and Utility of Non-Stationary Data Streams
    Khavkin, Michael
    Last, Mark
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 29 - 34
  • [30] Dirichlet process mixture models for non-stationary data streams
    Casado, Ioar
    Perez, Aritz
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 873 - 878