An Improved Method for Training Data Selection for Cross-Project Defect Prediction

被引：0

作者：

Nayeem Ahmad Bhat

Sheikh Umar Farooq

机构：

[1] University of Kashmir,Department of Computer Sciences, North Campus

来源：

Arabian Journal for Science and Engineering | 2022年 / 47卷

关键词：

Cross-project defect prediction; Class imbalance learning; Distributional difference; Data normalization; Software quality assurance; Training data selection;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The selection of relevant training data significantly improves the quality of cross-project defect prediction (CPDP) process. We propose a training data selection approach and compare its performance against the Burak filter and the Peter filter over Bug Prediction Dataset. In our approach (BurakMHD), firstly a data transformation is applied to the datasets. Then, individual instances of the target project adds k-instances at a minimum Hamming distance each from the transformed multi-source defective and non-defective data instances to the filtered training dataset (filtered TDS). Compared to using all the cross-project data, the false positive rate decreases by 10.6% associated with a 2.6% decrease in defect detection rate. The overall performance nMCC, Balance, G-measure increase by 2.9%, 5.7%, 6.6%, respectively. Compared to Burak filter and Peter filter, defect detection rate increases by 1.5% and 1.8%, respectively, and the false positive rate decreases by 6.4%. The overall performance nMCC, Balance, G-measure increase by 3%, 5.3%, 6.8% and by 3.2%, 5.5%, 7.1% compared to Burak and Peter filter, respectively. Compared to within-project predictions, the overall performance nMCC, Balance, G-measure increase by 1.1%, 3.4%, 4%, respectively, and the defect detection rate and false positive rate decrease by 9.2% and 13.1%, respectively. In general, our approach improved the performance significantly, compared to the Burak filter, Peter filter, cross-project prediction, and within-project prediction. Therefore, we conclude, applying data transformation and filtering training data separately from the defective and non-defective instances of cross-project data is helpful to select the relevant data for CPDP.

引用

页码：1939 / 1954

页数：15

共 50 条

[21] Cross-version defect prediction: use historical data, cross-project data, or both?
Sousuke Amasaki
Empirical Software Engineering, 2020, 25 : 1573 - 1595
[22] Cross-version defect prediction: use historical data, cross-project data, or both?
Amasaki, Sousuke
EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (02) : 1573 - 1595
[23] A Survey on Cross-Project Software Defect Prediction Methods
Chen X.
Wang L.-P.
Gu Q.
Wang Z.
Ni C.
Liu W.-S.
Wang Q.-P.
2018, Science Press (41): : 254 - 274
[24] Adversarial domain adaptation for cross-project defect prediction
Song, Hengjie
Wu, Guobin
Ma, Le
Pan, Yufei
Huang, Qingan
Jiang, Siyu
EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (05)
[25] Adversarial domain adaptation for cross-project defect prediction
Hengjie Song
Guobin Wu
Le Ma
Yufei Pan
Qingan Huang
Siyu Jiang
Empirical Software Engineering, 2023, 28
[26] CFPS: Collaborative filtering based source projects selection for cross-project defect prediction
Sun, Zhongbin
Li, Junqi
Sun, Heli
He, Liang
APPLIED SOFT COMPUTING, 2021, 99
[27] A Survey on Transfer Learning for Cross-Project Defect Prediction
Sotto-Mayor, Bruno
Kalech, Meir
IEEE ACCESS, 2024, 12 : 93398 - 93425
[28] Cross-project smell-based defect prediction
Bruno Sotto-Mayor
Meir Kalech
Soft Computing, 2021, 25 : 14171 - 14181
[29] An Empirical Study on Combining Source Selection and Transfer Learning for Cross-Project Defect Prediction
Wen, Wanzhi
Zhang, Bin
Gu, Xiang
Ju, Xiaolin
2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF '19), 2019, : 29 - 38
[30] Multi-Objective Cross-Project Defect Prediction
Canfora, Gerardo
De Lucia, Andrea
Di Penta, Massimiliano
Oliveto, Rocco
Panichella, Annibale
Panichella, Sebastiano
2013 IEEE SIXTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2013), 2013, : 252 - 261

← 1 2 3 4 5 →