An Improved Method for Training Data Selection for Cross-Project Defect Prediction

被引:0
|
作者
Nayeem Ahmad Bhat
Sheikh Umar Farooq
机构
[1] University of Kashmir,Department of Computer Sciences, North Campus
来源
Arabian Journal for Science and Engineering | 2022年 / 47卷
关键词
Cross-project defect prediction; Class imbalance learning; Distributional difference; Data normalization; Software quality assurance; Training data selection;
D O I
暂无
中图分类号
学科分类号
摘要
The selection of relevant training data significantly improves the quality of cross-project defect prediction (CPDP) process. We propose a training data selection approach and compare its performance against the Burak filter and the Peter filter over Bug Prediction Dataset. In our approach (BurakMHD), firstly a data transformation is applied to the datasets. Then, individual instances of the target project adds k-instances at a minimum Hamming distance each from the transformed multi-source defective and non-defective data instances to the filtered training dataset (filtered TDS). Compared to using all the cross-project data, the false positive rate decreases by 10.6% associated with a 2.6% decrease in defect detection rate. The overall performance nMCC, Balance, G-measure increase by 2.9%, 5.7%, 6.6%, respectively. Compared to Burak filter and Peter filter, defect detection rate increases by 1.5% and 1.8%, respectively, and the false positive rate decreases by 6.4%. The overall performance nMCC, Balance, G-measure increase by 3%, 5.3%, 6.8% and by 3.2%, 5.5%, 7.1% compared to Burak and Peter filter, respectively. Compared to within-project predictions, the overall performance nMCC, Balance, G-measure increase by 1.1%, 3.4%, 4%, respectively, and the defect detection rate and false positive rate decrease by 9.2% and 13.1%, respectively. In general, our approach improved the performance significantly, compared to the Burak filter, Peter filter, cross-project prediction, and within-project prediction. Therefore, we conclude, applying data transformation and filtering training data separately from the defective and non-defective instances of cross-project data is helpful to select the relevant data for CPDP.
引用
收藏
页码:1939 / 1954
页数:15
相关论文
共 50 条
  • [41] Cross-Version Defect Prediction using Cross-Project Defect Prediction Approaches: Does it work?
    Amasaki, Sousuke
    PROMISE'18: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, 2018, : 32 - 41
  • [42] An effective feature selection based cross-project defect prediction model for software quality improvement
    Yogita Khatri
    Sandeep Kumar Singh
    International Journal of System Assurance Engineering and Management, 2023, 14 : 154 - 172
  • [43] An effective feature selection based cross-project defect prediction model for software quality improvement
    Khatri, Yogita
    Singh, Sandeep Kumar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (SUPPL 1) : S154 - S172
  • [44] Cross-project defect prediction via semantic and syntactic encoding
    Jiang, Siyu
    Chen, Yuwen
    He, Zhenhang
    Shang, Yunpeng
    Ma, Le
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (04)
  • [45] DeepCPDP: Deep Learning Based Cross-Project Defect Prediction
    Chen, Deyu
    Chen, Xiang
    Li, Hao
    Xie, Junfeng
    Mu, Yanzhou
    IEEE ACCESS, 2019, 7 : 184832 - 184848
  • [46] Manifold embedded distribution adaptation for cross-project defect prediction
    Sun, Ying
    Jing, Xiao-Yuan
    Wu, Fei
    Sun, Yanfei
    IET SOFTWARE, 2020, 14 (07) : 825 - 838
  • [47] A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches
    Herbold, Steffen
    Trautsch, Alexander
    Grabowski, Jens
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (09) : 811 - 833
  • [48] Within-Project and Cross-Project Software Defect Prediction Based on Improved Transfer Naive Bayes Algorithm
    Zhu, Kun
    Zhang, Nana
    Ying, Shi
    Wang, Xu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 63 (02): : 891 - 910
  • [49] Improving Prediction Robustness of VAB-SVM for Cross-Project Defect Prediction
    Ryu, Duksan
    Choi, Okjoo
    Baik, Jongmoon
    2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 994 - 999
  • [50] Cross-project defect prediction using data sampling for class imbalance learning: an empirical study
    Goel, Lipika
    Sharma, Mayank
    Khatri, Sunil Kumar
    Damodaran, D.
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, : 130 - 143