Diversity based imbalance learning approach for software fault prediction using machine learning models

被引:26
作者
Manchala, Pravali [1 ]
Bisi, Manjubala [1 ]
机构
[1] Natl Inst Technol Warangal, Dept Comp Sci & Engn, Hanamkonda, Telangana, India
关键词
Imbalance learning; Software fault prediction; Oversampling; Machine Learning Model; Deep Neural Network; SUPPORT VECTOR MACHINE; DEFECT PREDICTION; SAMPLING METHOD; SMOTE; CLASSIFICATION;
D O I
10.1016/j.asoc.2022.109069
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Software fault prediction (SFP) target is to distinguish between faulty and non-faulty modules. The prediction model's performance is vulnerable to the class imbalance issue in SFP. The existing oversampling approaches generate relatively identical synthetic data, which results in overgeneralization and less diverse data. Moreover, many undesirable noisy modules are introduced while generating synthetic data. In this study, we propose the Weighted Average Centroid based Imbalance Learning Approach (WACIL), an effective synthetic over-sampling technique to mitigate the imbalance issue. The WACIL first finds borderline instances, then generates pseudo-data of them through a weighted average centroid concept and filters out inappropriate noise data through a filtration process. We conducted experiments on 24 PROMISE and NASA projects and compared them with some of the existing sampling approaches using K-Nearest Neighbors (KNN), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT) and Deep Neural Network (DNN) as classification models. WACIL achieves superior results in terms of Fall Out Rate (FOR), F-measure and Area Under Curve (AUC) and obtains comparable results in terms of Recall and G-mean compared to the competitive approaches. The statistical analysis indicates that WACIL's ability to outperform the other over-sampling techniques is significant under the statistical Wilcoxon signed rank test and matched pairs rank biserial correlation coefficient effect size. Hence, WACIL is advisable as a competent choice to deal with the imbalance issue in SFP. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 65 条
[1]   The Influence of Deep Learning Algorithms Factors in Software Fault Prediction [J].
Al Qasem, Osama ;
Akour, Mohammed ;
Alenezi, Mamdouh .
IEEE ACCESS, 2020, 8 (08) :63945-63960
[2]   Software defect prediction using cost-sensitive neural network [J].
Arar, Omer Faruk ;
Ayan, Kursat .
APPLIED SOFT COMPUTING, 2015, 33 :263-277
[3]  
Azzeh M, 2017, Arxiv, DOI arXiv:1703.04563
[4]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[5]  
Beckmann M., 2015, J. Intell. Learn. Syst. Appl, V7, P104, DOI DOI 10.4236/JILSA.2015.74010
[6]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets [J].
Bryll, R ;
Gutierrez-Osuna, R ;
Quek, F .
PATTERN RECOGNITION, 2003, 36 (06) :1291-1302
[9]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[10]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119