Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

被引:8
作者
Pandey, Sushant Kumar [1 ]
Tripathi, Anil Kumar [1 ]
机构
[1] Banaras Hindu Univ, Indian Inst Technol, Dept Comp Sci & Engn, Varanasi, Uttar Pradesh, India
来源
2021 8TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS (ICSCC) | 2021年
关键词
Software fault prediction; Class imbalance; Machine learning; Software metrics; Statistical methods;
D O I
10.1109/ICSCC51209.2021.9528170
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Software practitioners are continuing to build advanced software defect prediction (SDP) models to help the tester find fault-prone modules. However, the Class Imbalance (CI) problem consists of uncommonly few defective instances, and more non-defective instances cause inconsistency in the performance. We have conducted 880 experiments to analyze the variation in the performance of 10 SDP models by concerning the class imbalance problem. In our experiments, we have used 22 public datasets consists of 41 software metrics, 10 baseline SDP methods, and 4 sampling techniques. We used Mathews Correlation Coefficient (MCC), which is more useful when a dataset is highly imbalanced. We have also compared the predictive performance of various ML models by applying 4 sampling techniques. To examine the performance of different SDP models, we have used the F-measure. We found the performance of the learning models is unsatisfactory, which needs to mitigate. We have also found a few surprising results, some logical patterns between classifier and sampling technique. It provides a connection between sampling technique, software matrices, and a classifier.
引用
收藏
页码:58 / 63
页数:6
相关论文
共 28 条
[1]  
[Anonymous], 2007, The promise repository of empirical software engineering data
[2]  
[Anonymous], 2008, P 4 INT WORKSHOP PRE
[3]  
Bailey C. T., 1981, Performance Evaluation Review, V10, P189, DOI 10.1145/1010627.807928
[4]  
Bekkar M., 2013, J Informa Eng Appl, V3, DOI DOI 10.5121/IJDKP.2013.3402
[5]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[6]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1007/BF00058655
[7]   Imbalanced evolving self-organizing learning [J].
Cai, Qiao ;
He, Haibo ;
Man, Hong .
NEUROCOMPUTING, 2014, 133 :258-270
[8]   Tackling class overlap and imbalance problems in software defect prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
SOFTWARE QUALITY JOURNAL, 2018, 26 (01) :97-125
[9]   A METRICS SUITE FOR OBJECT-ORIENTED DESIGN [J].
CHIDAMBER, SR ;
KEMERER, CF .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) :476-493
[10]   MEASURING THE PSYCHOLOGICAL COMPLEXITY OF SOFTWARE MAINTENANCE TASKS WITH THE HALSTEAD AND MCCABE METRICS [J].
CURTIS, B ;
SHEPPARD, SB ;
MILLIMAN, P ;
BORST, MA ;
LOVE, T .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1979, 5 (02) :96-104