Software defect prediction oversampling technique with generalization and difficulty-aware

被引:0
作者
Fan, Hongqi [1 ,2 ]
Yan, Yuanting [1 ,2 ]
Zhang, Yiwen [1 ,2 ]
Zhang, Yanping [1 ,2 ]
机构
[1] Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei
[2] School of Computer Science and Technology, Anhui University, Hefei
来源
Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS | 2024年 / 30卷 / 08期
基金
中国国家自然科学基金;
关键词
class imbalance; overgeneralization; oversampling; software defect prediction;
D O I
10.13196/j.cims.2023.BPM02
中图分类号
学科分类号
摘要
The class imbalanced distribution of software defect data brings great challenges to software defect prediction. Synthetic oversampling is the most popular technique to solve this problem, but how to design a suitable sampling strategy to avoid the risk of over-generalization caused by the introduction of abnormal samples is still an open challenge for software defect prediction. To solve this problem, a Generalization and Difficulty-aware Oversampling (GDOS) method by combining the influence of sample learning difficulty and synthetic generalization for minority oversampling was proposed. For each oversampling seed sample, GDOS evaluated the selection weights of its assistant minority samples by measuring the safe factor and the generalization factor simultaneously according to its local prior probability and the sample distribution information of potential synthesis direction. Through suppressing the possibility of synthesizing samples in potential over-generalization regions and enhancing the possibility of synthesizing samples in relative safe directions, GDOS guaranteed the synthesis of high-quality samples. Numerical comparison with nine state-of-the-art methods on twenty-six datasets from the PROMISE repository had demonstrated the superiority of GDOS in terms of MCC, pd, pf and F-measure. © 2024 CIMS. All rights reserved.
引用
收藏
页码:2663 / 2671
页数:8
相关论文
共 29 条
[1]  
GONG Una, JIANG Shujuan, JIANG Li, Research progress of software defect prediction technology [J], Journal of Software, 30, 10, pp. 3090-3114, (2019)
[2]  
LI N, SHEPPRED M, GUO Y C, A systematic review of unsupervised learning techniques for software defect prediction [J], Information and Software Technology, 122, (2020)
[3]  
WAHONOR S., A systematic literature review of software defect prediction
[4]  
research trends, datasets, methods and frameworks, Journal of Software Engineering, 1, 1, pp. 1-16, (2015)
[5]  
CHEN Xiang, WANG Piping, GU Qing, Et al., A review of cross-project.software defect prediction methods[JJ, Journal of Computers, 41, 1, pp. 254-274, (2018)
[6]  
MALHOTRA R., A systematic review of machine learning techniques for software fault prediction, Applied Soft Computing, 27, pp. 504-518, (2015)
[7]  
SHARMEEN S, HUDA S, ABAWAJY J, Et al., An adaptive framework against android privilege escalation threats using deep learning and semi-supervised approaches, Applied Soft Computing, 89, (2020)
[8]  
HE H B, GARCIA E A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21, 9, pp. 1263-1284, (2009)
[9]  
MALHOTRAR, KAMAL S., An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data, Neurocomputing, 343, pp. 120-140, (2019)
[10]  
CHAWLAN V, BOWYER K W, HALL L, Et al., Smote-Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002)