Just-in-time software defect prediction method for non-stationary and imbalanced data streams

被引:0
|
作者
Wu, Qikai [1 ]
Wang, Xingqi [1 ,2 ]
Wei, Dan [1 ,2 ]
Chen, Bin [1 ,2 ]
Dang, Qingguo [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Peoples R China
[2] Key Lab Discrete Ind Internet Things Zhejiang Prov, Hangzhou, Peoples R China
关键词
Just-in-time software defect prediction; Online learning; Concept drift; Verification latency; Class imbalance learning;
D O I
10.1007/s11219-025-09711-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Compared to traditional software defect prediction, Just-In-Time Software Defect Prediction (JIT-SDP) is a more fine-grained software defect prediction method used for defect prediction at the software change level. However, JIT software defect datasets in online data stream scenarios suffer from issues like validation delay, concept drift, and class imbalance evolution, which severely impact the predictive performance of JIT-SDP. This paper introduces a just-in-time software defect prediction method for non-stationary and imbalanced data streams, JNAI (JIT-SDP method for Non-stationary And Imbalanced data streams). This method solves validation delays, concept drifts, and class imbalance issues in existing JIT software defect processing technology. It proposes a validation delay framework to correct data labels, and a concept drift adaptation mechanism that combines intra-project and cross-project data filtering to mitigate concept drift while avoiding prediction bias caused by cross-project data. Next, a dynamic classifier selection method integrating a tiered AdaBoost is designed, using classifiers trained on preceding data to predict subsequent data labels iteratively, thereby addressing the issue of class distribution imbalance in data streams. Finally, the Hoeffding Tree is selected as the base classifier, and the processed dataset is used to train it, forming the final model of the just-in-time software defect prediction method. Experiments were conducted on six public JIT-SDP datasets and ten open-source GitHub projects, and the results show that JNAI effectively improves the predictive performance of just-in-time software defect prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] STDS: self-training data streams for mining limited labeled data in non-stationary environment
    Khezri, Shirin
    Tanha, Jafar
    Ahmadi, Ali
    Sharifi, Arash
    APPLIED INTELLIGENCE, 2020, 50 (05) : 1448 - 1467
  • [42] An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams
    Hosseini, Mohammad Javad
    Gholipour, Ameneh
    Beigy, Hamid
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 46 (03) : 567 - 597
  • [43] An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams
    Mohammad Javad Hosseini
    Ameneh Gholipour
    Hamid Beigy
    Knowledge and Information Systems, 2016, 46 : 567 - 597
  • [44] Recovery analysis for adaptive learning from non-stationary data streams: Experimental design and case study
    Shaker, Ammar
    Huellermeier, Eyke
    NEUROCOMPUTING, 2015, 150 : 250 - 264
  • [45] Cross-Project setting using Deep learning Architectures in Just-In-Time Software Fault Prediction: An Investigation
    Pandey, Sushant Kumar
    Tripathi, Anil Kumar
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST, 2023, : 24 - 34
  • [46] Neural networks for online learning of non-stationary data streams: a review and application for smart grids flexibility improvement
    Hammami, Zeineb
    Sayed-Mouchaweh, Moamar
    Mouelhi, Wiem
    Ben Said, Lamjed
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) : 6111 - 6154
  • [47] Neural networks for online learning of non-stationary data streams: a review and application for smart grids flexibility improvement
    Zeineb Hammami
    Moamar Sayed-Mouchaweh
    Wiem Mouelhi
    Lamjed Ben Said
    Artificial Intelligence Review, 2020, 53 : 6111 - 6154
  • [48] Practical methods of tracking of non-stationary time series applied to real world data
    Nabney, IT
    McLachlan, A
    Lowe, D
    APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS II, 1996, 2760 : 152 - 163
  • [49] Parameter-efficient fine-tuning of pre-trained code models for just-in-time defect prediction
    Abu Talib M.
    Bou Nassif A.
    Azzeh M.
    Alesh Y.
    Afadar Y.
    Neural Computing and Applications, 36 (27) : 16911 - 16940
  • [50] Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review
    Palli, Abdul Sattar
    Jaafar, Jafreezal
    Gilal, Abdul Rehman
    Alsughayyir, Aeshah
    Gomes, Heitor Murilo
    Alshanqiti, Abdullah
    Omar, Mazni
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2024, 23 (01): : 105 - 139