An Improved Method for Training Data Selection for Cross-Project Defect Prediction

被引:0
|
作者
Nayeem Ahmad Bhat
Sheikh Umar Farooq
机构
[1] University of Kashmir,Department of Computer Sciences, North Campus
来源
Arabian Journal for Science and Engineering | 2022年 / 47卷
关键词
Cross-project defect prediction; Class imbalance learning; Distributional difference; Data normalization; Software quality assurance; Training data selection;
D O I
暂无
中图分类号
学科分类号
摘要
The selection of relevant training data significantly improves the quality of cross-project defect prediction (CPDP) process. We propose a training data selection approach and compare its performance against the Burak filter and the Peter filter over Bug Prediction Dataset. In our approach (BurakMHD), firstly a data transformation is applied to the datasets. Then, individual instances of the target project adds k-instances at a minimum Hamming distance each from the transformed multi-source defective and non-defective data instances to the filtered training dataset (filtered TDS). Compared to using all the cross-project data, the false positive rate decreases by 10.6% associated with a 2.6% decrease in defect detection rate. The overall performance nMCC, Balance, G-measure increase by 2.9%, 5.7%, 6.6%, respectively. Compared to Burak filter and Peter filter, defect detection rate increases by 1.5% and 1.8%, respectively, and the false positive rate decreases by 6.4%. The overall performance nMCC, Balance, G-measure increase by 3%, 5.3%, 6.8% and by 3.2%, 5.5%, 7.1% compared to Burak and Peter filter, respectively. Compared to within-project predictions, the overall performance nMCC, Balance, G-measure increase by 1.1%, 3.4%, 4%, respectively, and the defect detection rate and false positive rate decrease by 9.2% and 13.1%, respectively. In general, our approach improved the performance significantly, compared to the Burak filter, Peter filter, cross-project prediction, and within-project prediction. Therefore, we conclude, applying data transformation and filtering training data separately from the defective and non-defective instances of cross-project data is helpful to select the relevant data for CPDP.
引用
收藏
页码:1939 / 1954
页数:15
相关论文
共 50 条
  • [1] An Improved Method for Training Data Selection for Cross-Project Defect Prediction
    Bhat, Nayeem Ahmad
    Farooq, Sheikh Umar
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 1939 - 1954
  • [2] A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
    Chao Ni
    Wang-Shu Liu
    Xiang Chen
    Qing Gu
    Dao-Xu Chen
    Qi-Guo Huang
    Journal of Computer Science and Technology, 2017, 32 : 1090 - 1107
  • [3] DSSDPP: Data Selection and Sampling Based Domain Programming Predictor for Cross-Project Defect Prediction
    Li, Zhiqiang
    Zhang, Hongyu
    Jing, Xiao-Yuan
    Xie, Juanying
    Guo, Min
    Ren, Jie
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 1941 - 1963
  • [4] A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
    Ni, Chao
    Liu, Wang-Shu
    Chen, Xiang
    Gu, Qing
    Chen, Dao-Xu
    Huang, Qi-Guo
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (06) : 1090 - 1107
  • [5] Local modeling approach for cross-project defect prediction
    Bhat, Nayeem Ahmad
    Farooq, Sheikh Umar
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2021, 15 (04): : 623 - 637
  • [6] FeSCH: A Feature Selection Method using Clusters of Hybrid-data for Cross-Project Defect Prediction
    Ni, Chao
    Liu, Wangshu
    Gu, Qing
    Chen, Xiang
    Chen, Daoxu
    2017 IEEE 41ST ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2017, : 51 - 56
  • [7] Evaluating Data Filter on Cross-Project Defect Prediction: Comparison and Improvements
    Li, Yong
    Huang, Zhiqiu
    Wang, Yong
    Fang, Bingwu
    IEEE ACCESS, 2017, 5 : 25646 - 25656
  • [8] Correlation Metric Selection based Correlation Alignment for Cross-project Defect Prediction
    Niu, Jingwen
    Li, Zhiqiang
    Qi, Chao
    20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 490 - 495
  • [9] COMPLEXFUZZY: NOVEL CLUSTERING METHOD FOR SELECTING TRAINING INSTANCES OF CROSS-PROJECT DEFECT PREDICTION
    Ozturk, Muhammed Maruf
    COMPUTER SCIENCE-AGH, 2021, 22 (01): : 3 - 37
  • [10] Source selection and transfer defect learning based cross-project defect prediction
    Wen, Wanzhi
    Zhu, Ningbo
    Ye, Bingqing
    Li, Xikai
    Wang, Chuyue
    Chu, Jiawei
    Li, Yuehua
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2022, 16 (03) : 195 - 207