Just-in-time software defect prediction method for non-stationary and imbalanced data streams

被引:0
|
作者
Wu, Qikai [1 ]
Wang, Xingqi [1 ,2 ]
Wei, Dan [1 ,2 ]
Chen, Bin [1 ,2 ]
Dang, Qingguo [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Comp Sci, Hangzhou 310018, Peoples R China
[2] Key Lab Discrete Ind Internet Things Zhejiang Prov, Hangzhou, Peoples R China
关键词
Just-in-time software defect prediction; Online learning; Concept drift; Verification latency; Class imbalance learning;
D O I
10.1007/s11219-025-09711-w
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Compared to traditional software defect prediction, Just-In-Time Software Defect Prediction (JIT-SDP) is a more fine-grained software defect prediction method used for defect prediction at the software change level. However, JIT software defect datasets in online data stream scenarios suffer from issues like validation delay, concept drift, and class imbalance evolution, which severely impact the predictive performance of JIT-SDP. This paper introduces a just-in-time software defect prediction method for non-stationary and imbalanced data streams, JNAI (JIT-SDP method for Non-stationary And Imbalanced data streams). This method solves validation delays, concept drifts, and class imbalance issues in existing JIT software defect processing technology. It proposes a validation delay framework to correct data labels, and a concept drift adaptation mechanism that combines intra-project and cross-project data filtering to mitigate concept drift while avoiding prediction bias caused by cross-project data. Next, a dynamic classifier selection method integrating a tiered AdaBoost is designed, using classifiers trained on preceding data to predict subsequent data labels iteratively, thereby addressing the issue of class distribution imbalance in data streams. Finally, the Hoeffding Tree is selected as the base classifier, and the processed dataset is used to train it, forming the final model of the just-in-time software defect prediction method. Experiments were conducted on six public JIT-SDP datasets and ten open-source GitHub projects, and the results show that JNAI effectively improves the predictive performance of just-in-time software defect prediction.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] An ensemble-based semi-supervised learning approach for non-stationary imbalanced data streams with label scarcity
    Abdi, Yousef
    Asadpour, Mohammad
    Feizi-Derakhshi, Mohammad-Reza
    APPLIED SOFT COMPUTING, 2024, 167
  • [32] Drift Detection over Non-stationary Data Streams Using Evolving Spiking Neural Networks
    Lobo, Jesus L.
    Del Ser, Javier
    Lana, Ibai
    Nekane Bilbao, Miren
    Kasabov, Nikola
    INTELLIGENT DISTRIBUTED COMPUTING XII, 2018, 798 : 82 - 94
  • [33] Recursive least square perceptron model for non-stationary and imbalanced data stream classification
    Ghazikhani A.
    Monsefi R.
    Sadoghi Yazdi H.
    Ghazikhani, A. (a_ghazikhani@yahoo.com), 1600, Springer Verlag (04): : 119 - 131
  • [34] Enhancing Just-in-Time Defect Prediction Using Change Request-based Metrics
    Tessema, Hailemelekot Demtse
    Abebe, Surafel Lemma
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 511 - 515
  • [35] Detection of evolving concepts in non-stationary data streams: A multiple kernel learning approach
    Siahroudi, Sajjad Kamali
    Moodi, Poorya Zare
    Beigy, Hamid
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 187 - 197
  • [36] An Online Learning Algorithm for Non-stationary Imbalanced Data by Extra-Charging Minority Class
    Siahroudi, Sajjad Kamali
    Kudenko, Daniel
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT I, 2021, 12712 : 603 - 615
  • [37] Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach
    ZareMoodi, Poorya
    Siahroudi, Sajjad Kamali
    Beigy, Hamid
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 60 (03) : 1329 - 1352
  • [38] Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach
    Poorya ZareMoodi
    Sajjad Kamali Siahroudi
    Hamid Beigy
    Knowledge and Information Systems, 2019, 60 : 1329 - 1352
  • [39] STDS: self-training data streams for mining limited labeled data in non-stationary environment
    Shirin Khezri
    Jafar Tanha
    Ali Ahmadi
    Arash Sharifi
    Applied Intelligence, 2020, 50 : 1448 - 1467
  • [40] Graph-based method for autonomous adaptation in online learning of non-stationary data
    Alvarenga, W. J.
    Costa, A. C. A. A.
    Campos, F. V.
    Torres, L. C. B.
    Braga, A. P.
    INFORMATION SCIENCES, 2025, 700