A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling

被引:1
作者
Wang, Renliang [1 ]
Liu, Feng [1 ]
Bai, Yanhui [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, 3 Shangyuancun Haidian Dist, Beijing 100044, Peoples R China
关键词
software defect prediction; class overlap; data quality; noise filtering; imbalanced learning; CLASS IMBALANCE; SMOTE; CLASSIFICATION;
D O I
10.3390/electronics13203976
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction datasets often suffer from issues such as class imbalance, noise, and class overlap, making it difficult for classifiers to identify instances of defects. In response, researchers have proposed various techniques to mitigate the impact of these issues on classifier performance. Oversampling is a widely used method to address class imbalance. However, in addition to inherent noise and class overlap in the datasets themselves, oversampling methods can introduce new noise and class overlap while addressing class imbalance. To tackle these challenges, we propose a software defect prediction method called AS-KDENN, which simultaneously improves the effects of class imbalance, noise, and class overlap on classification models. AS-KDENN first performs oversampling using the Adaptive Synthetic Sampling Method (ADASYN), followed by our proposed KDENN method to address noise and class overlap. Unlike traditional methods, KDENN takes into account both the distance and local density information of overlapping samples, allowing for a more reasonable elimination of noise and instances of overlapping. To demonstrate the effectiveness of the AS-KDENN method, we conducted extensive experiments on 19 publicly available software defect prediction datasets. Compared to four commonly used oversampling techniques that also address class overlap or noise, the AS-KDENN method effectively alleviates issues of class imbalance, noise, and class overlap, subsequently improving the performance of the classifier models.
引用
收藏
页数:20
相关论文
共 55 条
[21]   Tackling Class Imbalance Problem in Software Defect Prediction Through Cluster-Based Over-Sampling With Filtering [J].
Gong, Lina ;
Jiang, Shujuan ;
Jiang, Li .
IEEE ACCESS, 2019, 7 :145725-145737
[22]   A Systematic Literature Review on Fault Prediction Performance in Software Engineering [J].
Hall, Tracy ;
Beecham, Sarah ;
Bowes, David ;
Gray, David ;
Counsell, Steve .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2012, 38 (06) :1276-1304
[23]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[24]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[25]  
Herzig K, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P392, DOI 10.1109/ICSE.2013.6606585
[26]   The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities [J].
Jimenez, Matthieu ;
Rwemalika, Renaud ;
Papadakis, Mike ;
Sarro, Federica ;
Le Traon, Yves ;
Harman, Mark .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :695-705
[27]  
Jureczko M., 2010, P 6 INT C PRED MOD S, P1, DOI DOI 10.1145/1868328.1868342
[28]  
Kim S, 2011, 2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), P481, DOI 10.1145/1985793.1985859
[29]   One Class Process Anomaly Detection Using Kernel Density Estimation Methods [J].
Lang, Christopher, I ;
Sun, Fan-Keng ;
Lawler, Bruce ;
Dillon, Jack ;
Al Dujaili, Ash ;
Ruth, John ;
Cardillo, Peter ;
Alfred, Perry ;
Bowers, Alan ;
Mckiernan, Adrian ;
Boning, Duane S. .
IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2022, 35 (03) :457-469
[30]   A Robust Framework for Self-Care Problem Identification for Children with Disability [J].
Le, Tuong ;
Baik, Sung Wook .
SYMMETRY-BASEL, 2019, 11 (01)