A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling

被引:1
作者
Wang, Renliang [1 ]
Liu, Feng [1 ]
Bai, Yanhui [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp Sci & Technol, 3 Shangyuancun Haidian Dist, Beijing 100044, Peoples R China
关键词
software defect prediction; class overlap; data quality; noise filtering; imbalanced learning; CLASS IMBALANCE; SMOTE; CLASSIFICATION;
D O I
10.3390/electronics13203976
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software defect prediction datasets often suffer from issues such as class imbalance, noise, and class overlap, making it difficult for classifiers to identify instances of defects. In response, researchers have proposed various techniques to mitigate the impact of these issues on classifier performance. Oversampling is a widely used method to address class imbalance. However, in addition to inherent noise and class overlap in the datasets themselves, oversampling methods can introduce new noise and class overlap while addressing class imbalance. To tackle these challenges, we propose a software defect prediction method called AS-KDENN, which simultaneously improves the effects of class imbalance, noise, and class overlap on classification models. AS-KDENN first performs oversampling using the Adaptive Synthetic Sampling Method (ADASYN), followed by our proposed KDENN method to address noise and class overlap. Unlike traditional methods, KDENN takes into account both the distance and local density information of overlapping samples, allowing for a more reasonable elimination of noise and instances of overlapping. To demonstrate the effectiveness of the AS-KDENN method, we conducted extensive experiments on 19 publicly available software defect prediction datasets. Compared to four commonly used oversampling techniques that also address class overlap or noise, the AS-KDENN method effectively alleviates issues of class imbalance, noise, and class overlap, subsequently improving the performance of the classifier models.
引用
收藏
页数:20
相关论文
共 55 条
[1]   Snoring: a Noise in Defect Prediction Datasets [J].
Ahluwalia, Aalok ;
Falessi, Davide ;
Di Penta, Massimiliano .
2019 IEEE/ACM 16TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2019), 2019, :63-67
[2]   Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets [J].
Alan, Oral ;
Catal, Cagatay .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) :3440-3445
[3]  
Ali A., 2015, Int. J. Adv. Soft Comput. Appl., V7, P176
[4]   RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification [J].
Arafa, Ahmed ;
El-Fishawy, Nawal ;
Badawy, Mohammed ;
Radad, Marwa .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) :5059-5074
[5]   A systematic and comprehensive investigation of methods to build and evaluate fault prediction models [J].
Arisholm, Erik ;
Briand, Lionel C. ;
Johannessen, Eivind B. .
JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (01) :2-17
[6]   An Investigation of SMOTE Based Methods for Imbalanced Datasets With Data Complexity Analysis [J].
Azhar, Nur Athirah ;
Pozi, Muhammad Syafiq Mohd ;
Din, Aniza Mohamed ;
Jatowt, Adam .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) :6651-6672
[7]  
Batista GEAPA., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI DOI 10.1145/1007730.1007735
[8]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[9]   Active label cleaning for improved dataset quality under resource constraints [J].
Bernhardt, Melanie ;
Castro, Daniel C. ;
Tanno, Ryutaro ;
Schwaighofer, Anton ;
Tezcan, Kerem C. ;
Monteiro, Miguel ;
Bannur, Shruthi ;
Lungren, Matthew ;
Nori, Aditya ;
Glocker, Ben ;
Alvarez-Valle, Javier ;
Oktay, Ozan .
NATURE COMMUNICATIONS, 2022, 13 (01)
[10]   Data quality issues in software fault prediction: a systematic literature review [J].
Bhandari, Kirti ;
Kumar, Kuldeep ;
Sangal, Amrit Lal .
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (08) :7839-7908