Cross-Project Online Just-In-Time Software Defect Prediction

被引:16
作者
Tabassum, Sadia [1 ]
Minku, Leandro L. [1 ]
Feng, Danyi [2 ]
机构
[1] Univ Birminigham, Sch Comp Sci, Birmingham B15 2TT, England
[2] Xiliu Tech, Beijing 100050, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Training; Software; Training data; Predictive models; Codes; Resource management; Open source software; Software defect prediction; cross-project learning; transfer learning; online learning; verification latency; concept drift; MACHINE;
D O I
10.1109/TSE.2022.3150153
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cross-Project (CP) Just-In-Time Software Defect Prediction (JIT-SDP) makes use of CP data to overcome the lack of data necessary to train well performing JIT-SDP classifiers at the beginning of software projects. However, such approaches have never been investigated in realistic online learning scenarios, where Within-Project (WP) software changes naturally arrive over time and can be used to automatically update the classifiers. We provide the first investigation of when and to what extent CP data are useful for JIT-SDP in such realistic scenarios. For that, we propose three different online CP JIT-SDP approaches that can be updated with incoming CP and WP training examples over time. We also collect data on 9 proprietary software projects and use 10 open source software projects to analyse these approaches. We find that training classifiers with incoming CP+WP data can lead to absolute improvements in G-mean of up to 53.89% and up to 35.02% at the initial stage of the projects compared to classifiers using WP-only and CP-only data, respectively. Using CP+WP data was also shown to be beneficial after a large number of WP data were received. Using CP data to supplement WP data helped the classifiers to reduce or prevent large drops in predictive performance that may occur over time, leading to absolute G-Mean improvements of up to 37.35% and 48.16% compared to WP-only and CP-only data during such periods, respectively. During periods of stable predictive performance, absolute improvements were of up to 29.03% and up to 41.25% compared to WP-only and CP-only classifiers, respectively. Our results highlight the importance of using both CP and WP data together in realistic online JIT-SDP scenarios.
引用
收藏
页码:268 / 287
页数:20
相关论文
共 50 条
  • [31] A Novel Cross-Project Software Defect Prediction Algorithm Based on Transfer Learning
    Shiqi Tang
    Song Huang
    Changyou Zheng
    Erhu Liu
    Cheng Zong
    Yixian Ding
    Tsinghua Science and Technology, 2022, 27 (01) : 41 - 57
  • [32] Cross-Project Software Defect Prediction Based on Feature Selection and Knowledge Distillation
    Ling, Songsong
    Tang, Bin
    Tao, Ye
    Hu, Qiang
    Du, Junwei
    Yu, Xu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14866 : 137 - 149
  • [33] An investigation on the feasibility of cross-project defect prediction
    Zhimin He
    Fengdi Shu
    Ye Yang
    Mingshu Li
    Qing Wang
    Automated Software Engineering, 2012, 19 : 167 - 199
  • [34] Cross-project software defect prediction based on domain adaptation learning and optimization
    Jin, Cong
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 171
  • [35] A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
    Chao Ni
    Wang-Shu Liu
    Xiang Chen
    Qing Gu
    Dao-Xu Chen
    Qi-Guo Huang
    Journal of Computer Science and Technology, 2017, 32 : 1090 - 1107
  • [36] An investigation on the feasibility of cross-project defect prediction
    He, Zhimin
    Shu, Fengdi
    Yang, Ye
    Li, Mingshu
    Wang, Qing
    AUTOMATED SOFTWARE ENGINEERING, 2012, 19 (02) : 167 - 199
  • [37] A Novel Cross-Project Software Defect Prediction Algorithm Based on Transfer Learning
    Tang, Shiqi
    Huang, Song
    Zheng, Changyou
    Liu, Erhu
    Zong, Cheng
    Ding, Yixian
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (01) : 41 - 57
  • [38] Cross-Project Defect Prediction: A Literature Review
    Pal, Sourabh
    Sillitti, Alberto
    IEEE ACCESS, 2022, 10 : 118697 - 118717
  • [39] A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
    Ni, Chao
    Liu, Wang-Shu
    Chen, Xiang
    Gu, Qing
    Chen, Dao-Xu
    Huang, Qi-Guo
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (06) : 1090 - 1107
  • [40] eCPDP : Early Cross-Project Defect Prediction
    Kwon, Sunjae
    Ryu, Duksan
    Baik, Jongmoon
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 470 - 481