An Improved Method for Training Data Selection for Cross-Project Defect Prediction

被引:0
|
作者
Nayeem Ahmad Bhat
Sheikh Umar Farooq
机构
[1] University of Kashmir,Department of Computer Sciences, North Campus
来源
Arabian Journal for Science and Engineering | 2022年 / 47卷
关键词
Cross-project defect prediction; Class imbalance learning; Distributional difference; Data normalization; Software quality assurance; Training data selection;
D O I
暂无
中图分类号
学科分类号
摘要
The selection of relevant training data significantly improves the quality of cross-project defect prediction (CPDP) process. We propose a training data selection approach and compare its performance against the Burak filter and the Peter filter over Bug Prediction Dataset. In our approach (BurakMHD), firstly a data transformation is applied to the datasets. Then, individual instances of the target project adds k-instances at a minimum Hamming distance each from the transformed multi-source defective and non-defective data instances to the filtered training dataset (filtered TDS). Compared to using all the cross-project data, the false positive rate decreases by 10.6% associated with a 2.6% decrease in defect detection rate. The overall performance nMCC, Balance, G-measure increase by 2.9%, 5.7%, 6.6%, respectively. Compared to Burak filter and Peter filter, defect detection rate increases by 1.5% and 1.8%, respectively, and the false positive rate decreases by 6.4%. The overall performance nMCC, Balance, G-measure increase by 3%, 5.3%, 6.8% and by 3.2%, 5.5%, 7.1% compared to Burak and Peter filter, respectively. Compared to within-project predictions, the overall performance nMCC, Balance, G-measure increase by 1.1%, 3.4%, 4%, respectively, and the defect detection rate and false positive rate decrease by 9.2% and 13.1%, respectively. In general, our approach improved the performance significantly, compared to the Burak filter, Peter filter, cross-project prediction, and within-project prediction. Therefore, we conclude, applying data transformation and filtering training data separately from the defective and non-defective instances of cross-project data is helpful to select the relevant data for CPDP.
引用
收藏
页码:1939 / 1954
页数:15
相关论文
共 50 条
  • [31] Impact of hyper parameter optimization for cross-project software defect prediction
    Qu Y.
    Chen X.
    Zhao Y.
    Ju X.
    International Journal of Performability Engineering, 2018, 14 (06): : 1291 - 1299
  • [32] Cross-project smell-based defect prediction
    Sotto-Mayor, Bruno
    Kalech, Meir
    SOFT COMPUTING, 2021, 25 (22) : 14171 - 14181
  • [33] An empirical evaluation of defect prediction approaches in within-project and cross-project context
    Bhat, Nayeem Ahmad
    Farooq, Sheikh Umar
    SOFTWARE QUALITY JOURNAL, 2023, 31 (03) : 917 - 946
  • [34] An empirical evaluation of defect prediction approaches in within-project and cross-project context
    Nayeem Ahmad Bhat
    Sheikh Umar Farooq
    Software Quality Journal, 2023, 31 : 917 - 946
  • [35] WIFLF: An approach independent of the target project for cross-project defect prediction
    Cui, Can
    Liu, Bin
    Wang, Shihai
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2022, 34 (12)
  • [36] Improving Relevancy Filter Methods for Cross-Project Defect Prediction
    Kawata, Kazuya
    Amasaki, Sousuke
    Yokogawa, Tomoyuki
    APPLIED COMPUTING & INFORMATION TECHNOLOGY, 2016, 619 : 1 - 12
  • [37] Improving transfer learning for software cross-project defect prediction
    Omondiagbe, Osayande P.
    Licorish, Sherlock A.
    Macdonell, Stephen G.
    APPLIED INTELLIGENCE, 2024, 54 (07) : 5593 - 5616
  • [38] Transfer Convolutional Neural Network for Cross-Project Defect Prediction
    Qiu, Shaojian
    Xu, Hao
    Deng, Jiehan
    Jiang, Siyu
    Lu, Lu
    APPLIED SCIENCES-BASEL, 2019, 9 (13):
  • [39] HYDRA: Massively Compositional Model for Cross-Project Defect Prediction
    Xia, Xin
    Lo, David
    Pan, Sinno Jialin
    Nagappan, Nachiappan
    Wang, Xinyu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2016, 42 (10) : 977 - 998
  • [40] Cross-project software defect prediction based on multi-source data sets
    Junfu H.
    Yawen W.
    Yunzhan G.
    Dahai J.
    Journal of China Universities of Posts and Telecommunications, 2021, 28 (04): : 75 - 87