Just-in-time software defect prediction method for non-stationary and imbalanced data streams

被引:0
|
作者
Wu, Qikai [1 ]
Wang, Xingqi [1 ,2 ]
Wei, Dan [1 ,2 ]
Chen, Bin [1 ,2 ]
Dang, Qingguo [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Peoples R China
[2] Key Lab Discrete Ind Internet Things Zhejiang Prov, Hangzhou, Peoples R China
关键词
Just-in-time software defect prediction; Online learning; Concept drift; Verification latency; Class imbalance learning;
D O I
10.1007/s11219-025-09711-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Compared to traditional software defect prediction, Just-In-Time Software Defect Prediction (JIT-SDP) is a more fine-grained software defect prediction method used for defect prediction at the software change level. However, JIT software defect datasets in online data stream scenarios suffer from issues like validation delay, concept drift, and class imbalance evolution, which severely impact the predictive performance of JIT-SDP. This paper introduces a just-in-time software defect prediction method for non-stationary and imbalanced data streams, JNAI (JIT-SDP method for Non-stationary And Imbalanced data streams). This method solves validation delays, concept drifts, and class imbalance issues in existing JIT software defect processing technology. It proposes a validation delay framework to correct data labels, and a concept drift adaptation mechanism that combines intra-project and cross-project data filtering to mitigate concept drift while avoiding prediction bias caused by cross-project data. Next, a dynamic classifier selection method integrating a tiered AdaBoost is designed, using classifiers trained on preceding data to predict subsequent data labels iteratively, thereby addressing the issue of class distribution imbalance in data streams. Finally, the Hoeffding Tree is selected as the base classifier, and the processed dataset is used to train it, forming the final model of the just-in-time software defect prediction method. Experiments were conducted on six public JIT-SDP datasets and ten open-source GitHub projects, and the results show that JNAI effectively improves the predictive performance of just-in-time software defect prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [11] Ensemble of online neural networks for non-stationary and imbalanced data streams
    Ghazikhani, Adel
    Monsefi, Reza
    Yazdi, Hadi Sadoghi
    NEUROCOMPUTING, 2013, 122 : 535 - 544
  • [12] Online Oversampling for Sparsely Labeled Imbalanced and Non-Stationary Data Streams
    Korycki, Lukasz
    Krawczyk, Bartosz
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [13] TWAO: Time-Weight-Aware Oversampling Method for Just-in-Time Software Defect Prediction
    Xue, Qi
    Zhuang, Weiyuan
    Zhao, Lei
    Zhangw, Xiaofang
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 328 - 339
  • [14] A Novel Effort Measure Method for Effort-Aware Just-in-Time Software Defect Prediction
    Chen, Liqiong
    Song, Shilong
    Wang, Can
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2021, 31 (08) : 1145 - 1169
  • [15] Online cross-project approach with project-level similarity for just-in-time software defect prediction
    Teng, Cong
    Song, Liyan
    Yao, Xin
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (06)
  • [16] Real-time data mining of non-stationary data streams from sensor networks
    Cohen, Lior
    Avrahami-Bakish, Gil
    Last, Mark
    Kandel, Abraham
    Kipersztok, Oscar
    INFORMATION FUSION, 2008, 9 (03) : 344 - 353
  • [17] Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams
    Ghazikhani, Adel
    Monsefi, Reza
    Yazdi, Hadi Sadoghi
    NEURAL COMPUTING & APPLICATIONS, 2013, 23 (05) : 1283 - 1295
  • [18] Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams
    Adel Ghazikhani
    Reza Monsefi
    Hadi Sadoghi Yazdi
    Neural Computing and Applications, 2013, 23 : 1283 - 1295
  • [19] An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction
    Cabral, George G.
    Minku, Leandro L.
    Oliveira, Adriano L. I.
    Pessoa, Dinaldo A.
    Tabassum, Sadia
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (05)
  • [20] Rank Aggregation for Non-stationary Data Streams
    Irurozki, Ekhine
    Perez, Aritz
    Lobo, Jesus
    Del Ser, Javier
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 297 - 313