IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction

被引:0
作者
Sahar, Hareem [1 ]
Bangash, Abdul Ali [2 ]
Hindle, Abram [1 ]
Barbosa, Denilson
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[2] Queens Univ, Dept Comp Sci, Kingston, ON, Canada
关键词
Defect Prediction; Just-in-time; Information Retrieval;
D O I
10.1007/s10664-024-10514-z
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time. Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models. These models typically involve extensive training processes that may require significant computational resources and time. These characteristics can pose challenges when attempting to update the models in real-time as new examples become available, potentially impacting their suitability for fast online defect prediction. Furthermore, the reliance on a complex underlying model makes these approaches often less explainable, which means the developers cannot understand the reasons behind models' predictions. An approach that is not explainable might not be adopted in real-life development environments because of developers' lack of trust in its results. To address these limitations, we propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits. IRJIT approach is online and explainable as it can learn from new data without expensive retraining, and developers can see the documents that support a prediction, providing additional context. By evaluating 10 open-source datasets in a within project setting, we show that our approach is up to 112 times faster than the state-of-the-art ML and DL approaches, offers explainability at the commit and line level, and has comparable performance to the state-of-the-art.
引用
收藏
页数:34
相关论文
共 48 条
  • [41] Online Defect Prediction for Imbalanced Data
    Tan, Ming
    Tan, Lin
    Dara, Sashank
    Mayeux, Caleb
    [J]. 2015 IEEE/ACM 37TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, VOL 2, 2015, : 99 - 108
  • [42] Thong Hoang, 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), P34, DOI 10.1109/MSR.2019.00016
  • [43] Perceptions, Expectations, and Challenges in Defect Prediction
    Wan, Zhiyuan
    Xia, Xin
    Hassan, Ahmed E.
    Lo, David
    Yin, Jianwei
    Yang, Xiaohu
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (11) : 1241 - 1266
  • [44] A Systematic Study of Online Class Imbalance Learning With Concept Drift
    Wang, Shuo
    Minku, Leandro L.
    Yao, Xin
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (10) : 4802 - 4821
  • [45] Predicting Defective Lines Using a Model-Agnostic Technique
    Wattanakriengkrai, Supatsara
    Thongtanunam, Patanamon
    Tantithamthavorn, Chakkrit
    Hata, Hideaki
    Matsumoto, Kenichi
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (05) : 1480 - 1496
  • [46] Wu Yinjun, 2020, PMLR, V119, P10355
  • [47] Just-In-Time Defect Identification and Localization: A Two-Phase Framework
    Yan, Meng
    Xia, Xin
    Fan, Yuanrui
    Hassan, Ahmed E.
    Lo, David
    Li, Shanping
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (01) : 82 - 101
  • [48] Lancer: Your Code Tell Me What You Need
    Zhou, Shufan
    Shen, Beijun
    Zhong, Itho
    [J]. 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 1202 - 1205