An Improved Method for Training Data Selection for Cross-Project Defect Prediction

被引:0
|
作者
Nayeem Ahmad Bhat
Sheikh Umar Farooq
机构
[1] University of Kashmir,Department of Computer Sciences, North Campus
来源
Arabian Journal for Science and Engineering | 2022年 / 47卷
关键词
Cross-project defect prediction; Class imbalance learning; Distributional difference; Data normalization; Software quality assurance; Training data selection;
D O I
暂无
中图分类号
学科分类号
摘要
The selection of relevant training data significantly improves the quality of cross-project defect prediction (CPDP) process. We propose a training data selection approach and compare its performance against the Burak filter and the Peter filter over Bug Prediction Dataset. In our approach (BurakMHD), firstly a data transformation is applied to the datasets. Then, individual instances of the target project adds k-instances at a minimum Hamming distance each from the transformed multi-source defective and non-defective data instances to the filtered training dataset (filtered TDS). Compared to using all the cross-project data, the false positive rate decreases by 10.6% associated with a 2.6% decrease in defect detection rate. The overall performance nMCC, Balance, G-measure increase by 2.9%, 5.7%, 6.6%, respectively. Compared to Burak filter and Peter filter, defect detection rate increases by 1.5% and 1.8%, respectively, and the false positive rate decreases by 6.4%. The overall performance nMCC, Balance, G-measure increase by 3%, 5.3%, 6.8% and by 3.2%, 5.5%, 7.1% compared to Burak and Peter filter, respectively. Compared to within-project predictions, the overall performance nMCC, Balance, G-measure increase by 1.1%, 3.4%, 4%, respectively, and the defect detection rate and false positive rate decrease by 9.2% and 13.1%, respectively. In general, our approach improved the performance significantly, compared to the Burak filter, Peter filter, cross-project prediction, and within-project prediction. Therefore, we conclude, applying data transformation and filtering training data separately from the defective and non-defective instances of cross-project data is helpful to select the relevant data for CPDP.
引用
收藏
页码:1939 / 1954
页数:15
相关论文
共 50 条
  • [21] Cross-version defect prediction: use historical data, cross-project data, or both?
    Sousuke Amasaki
    Empirical Software Engineering, 2020, 25 : 1573 - 1595
  • [22] Cross-version defect prediction: use historical data, cross-project data, or both?
    Amasaki, Sousuke
    EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (02) : 1573 - 1595
  • [23] A Survey on Cross-Project Software Defect Prediction Methods
    Chen X.
    Wang L.-P.
    Gu Q.
    Wang Z.
    Ni C.
    Liu W.-S.
    Wang Q.-P.
    2018, Science Press (41): : 254 - 274
  • [24] Adversarial domain adaptation for cross-project defect prediction
    Song, Hengjie
    Wu, Guobin
    Ma, Le
    Pan, Yufei
    Huang, Qingan
    Jiang, Siyu
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (05)
  • [25] Adversarial domain adaptation for cross-project defect prediction
    Hengjie Song
    Guobin Wu
    Le Ma
    Yufei Pan
    Qingan Huang
    Siyu Jiang
    Empirical Software Engineering, 2023, 28
  • [26] CFPS: Collaborative filtering based source projects selection for cross-project defect prediction
    Sun, Zhongbin
    Li, Junqi
    Sun, Heli
    He, Liang
    APPLIED SOFT COMPUTING, 2021, 99
  • [27] A Survey on Transfer Learning for Cross-Project Defect Prediction
    Sotto-Mayor, Bruno
    Kalech, Meir
    IEEE ACCESS, 2024, 12 : 93398 - 93425
  • [28] Cross-project smell-based defect prediction
    Bruno Sotto-Mayor
    Meir Kalech
    Soft Computing, 2021, 25 : 14171 - 14181
  • [29] An Empirical Study on Combining Source Selection and Transfer Learning for Cross-Project Defect Prediction
    Wen, Wanzhi
    Zhang, Bin
    Gu, Xiang
    Ju, Xiaolin
    2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF '19), 2019, : 29 - 38
  • [30] Multi-Objective Cross-Project Defect Prediction
    Canfora, Gerardo
    De Lucia, Andrea
    Di Penta, Massimiliano
    Oliveto, Rocco
    Panichella, Annibale
    Panichella, Sebastiano
    2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2013), 2013, : 252 - 261