Diversity based imbalance learning approach for software fault prediction using machine learning models

被引:26
作者
Manchala, Pravali [1 ]
Bisi, Manjubala [1 ]
机构
[1] Natl Inst Technol Warangal, Dept Comp Sci & Engn, Hanamkonda, Telangana, India
关键词
Imbalance learning; Software fault prediction; Oversampling; Machine Learning Model; Deep Neural Network; SUPPORT VECTOR MACHINE; DEFECT PREDICTION; SAMPLING METHOD; SMOTE; CLASSIFICATION;
D O I
10.1016/j.asoc.2022.109069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Software fault prediction (SFP) target is to distinguish between faulty and non-faulty modules. The prediction model's performance is vulnerable to the class imbalance issue in SFP. The existing oversampling approaches generate relatively identical synthetic data, which results in overgeneralization and less diverse data. Moreover, many undesirable noisy modules are introduced while generating synthetic data. In this study, we propose the Weighted Average Centroid based Imbalance Learning Approach (WACIL), an effective synthetic over-sampling technique to mitigate the imbalance issue. The WACIL first finds borderline instances, then generates pseudo-data of them through a weighted average centroid concept and filters out inappropriate noise data through a filtration process. We conducted experiments on 24 PROMISE and NASA projects and compared them with some of the existing sampling approaches using K-Nearest Neighbors (KNN), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT) and Deep Neural Network (DNN) as classification models. WACIL achieves superior results in terms of Fall Out Rate (FOR), F-measure and Area Under Curve (AUC) and obtains comparable results in terms of Recall and G-mean compared to the competitive approaches. The statistical analysis indicates that WACIL's ability to outperform the other over-sampling techniques is significant under the statistical Wilcoxon signed rank test and matched pairs rank biserial correlation coefficient effect size. Hence, WACIL is advisable as a competent choice to deal with the imbalance issue in SFP. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 65 条
[31]   Hybrid neural network with cost-sensitive support vector machine for class-imbalanced multimodal data [J].
Kim, Kyung Hye ;
Sohn, So Young .
NEURAL NETWORKS, 2020, 130 :176-184
[32]  
King B. M., 2011, STAT REASONING BEHAV
[33]   Learning from imbalanced data: open challenges and future directions [J].
Krawczyk B. .
Krawczyk, Bartosz (bartosz.krawczyk@pwr.edu.pl), 1600, Springer Verlag (05) :221-232
[34]  
Kubat M., 1997, ICML, V97, P179
[35]   SOTB: Semi-Supervised Oversampling Approach Based on Trigonal Barycenter Theory [J].
Liu, Dingxiang ;
Qiao, Shaojie ;
Han, Nan ;
Wu, Tao ;
Mao, Rui ;
Zhang, Yongqing ;
Yuan, Chang-An ;
Xiao, Yueqiang .
IEEE ACCESS, 2020, 8 :50180-50189
[36]   Two-Stage Cost-Sensitive Learning for Software Defect Prediction [J].
Liu, Mingxia ;
Miao, Linsong ;
Zhang, Daoqiang .
IEEE TRANSACTIONS ON RELIABILITY, 2014, 63 (02) :676-686
[37]  
Liu W., 2010, P 2010 SIAM INT C DA, P766, DOI [10.1137/1.9781611972801.67, DOI 10.1137/1.9781611972801.67]
[38]   Exploratory Undersampling for Class-Imbalance Learning [J].
Liu, Xu-Ying ;
Wu, Jianxin ;
Zhou, Zhi-Hua .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02) :539-550
[39]  
Lyu M. R., 1996, HDB SOFTWARE RELIABI
[40]   Deep neural network based hybrid approach for software defect prediction using software metrics [J].
Manjula, C. ;
Florence, Lilly .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 4) :S9847-S9863