SMOTEFRIS-INFFC: Handling the challenge of borderline and noisy examples in imbalanced learning for software defect prediction

被引:11
作者
Bashir, Kamal [1 ,3 ]
Li, Tianrui [1 ]
Yohannese, Chubato Wondaferaw [1 ]
Yahaya, Mahama [2 ]
机构
[1] Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu, Peoples R China
[2] Southwest Jiaotong Univ, Sch Transport & Logist Engn, Chengdu, Peoples R China
[3] Karary Univ, Coll Comp Sci & Informat Technol, Dept Informat Technol, Omdurman, Sudan
基金
美国国家科学基金会;
关键词
Software defect prediction; data sampling; fuzzy rough set; noise filtering; SAMPLING METHOD; CLASSIFICATION; FRAMEWORK; SETS;
D O I
10.3233/JIFS-179459
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The object of Software Defect Prediction (SDP) is to identify modules that are prone to defect. This is achieved by training prediction models with datasets obtained by mining software historical depositories. When one acquires data through this approach, it often includes class imbalance which has an unequal class representation among their example. We hypothesize that the imbalance learning is not a problem in itself and decrease in performance is also influenced by other factors related to class distribution in the data. One of these is the existence of noisy and borderline examples. Thus, the objective of our research is to propose a novel preprocessing method using Synthetic Minority Over-Sampling Technique (SMOTE), Fuzzy-rough Instance Selection type II (FRIS-II) and Iterative Noise Filter based on the Fusion of Classifiers (INFFC) which can overcome these problems. The experimental results show that the new proposal significantly outperformed all the methods compared in this study.
引用
收藏
页码:917 / 933
页数:17
相关论文
共 40 条
[1]   KEEL: a software tool to assess evolutionary algorithms for data mining problems [J].
Alcala-Fdez, J. ;
Sanchez, L. ;
Garcia, S. ;
del Jesus, M. J. ;
Ventura, S. ;
Garrell, J. M. ;
Otero, J. ;
Romero, C. ;
Bacardit, J. ;
Rivas, V. M. ;
Fernandez, J. C. ;
Herrera, F. .
SOFT COMPUTING, 2009, 13 (03) :307-318
[2]  
[Anonymous], 1990, IEEE Standard 610.12-1990, DOI DOI 10.1109/IEEESTD.1990.101064
[3]  
[Anonymous], 1997, P 14 INT C MACH LEAR
[4]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[5]  
Bashir k., 2018, 13 INT C DAT SCI KNO
[6]  
Bashir Kamal., 2017, 2017 12 INT C INT SY 2017 12 INT C INT SY, P1
[7]  
Batista G. E. A. P. A., 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[8]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[9]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[10]   A systematic review of software fault prediction studies [J].
Catal, Cagatay ;
Diri, Banu .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7346-7354