Software fault prediction using evolving populations with mathematical diversification

被引:2
作者
Goyal, Somya [1 ]
机构
[1] Manipal Univ Jaipur, Jaipur 303007, Rajasthan, India
关键词
Software fault prediction (SFP); Feature selection (FS); Search-based software engineering (SBSE); Genetic evolution (GE); Mathematical operator algorithm; Artificial neural network (ANN); DEFECT PREDICTION; FEATURE-SELECTION; OPTIMIZATION; METRICS; ALGORITHM; QUALITY;
D O I
10.1007/s00500-022-07445-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software fault prediction (SFP) plays a vital role into fostering high quality throughout the software development process. It allows to identify the fault-prone modules in early development phases and facilitates the focused and effective testing over the fault-prone modules. Machine learning (ML)-based classifiers are prominently being used for fault prediction in the software industry. The accuracy of the ML models depends upon the training data and its quality. The curse of high dimensionality adversely impacts the classification power of a ML model. The presence of inter-correlated, insignificant and/or redundant features (or attributes) in the training data hinders the performance of ML classifiers. Feature preprocessing (or feature selection (FS)) is the solution to this issue. Meta-heuristics is the key method to find out the most significant feature subset. In this paper, a novel feature selection method is devised using mathematical diversification for genetic evolution. It avoids the local optimums by utilizing arithmetic diversification among the candidate solutions (or populations). The survival of fittest is the working principle of evolving populations with crossover and mutation operations. The selected feature subset is fed to five classification algorithms, namely artificial neural network, support vector machine, decision tree, k-nearest neighbor and naive Bayes. The proposed model is trained and tested over five datasets from NASA corpus, namely CM1, JM1, KC1, KC2 and PC1. In total, 100 SFP models are implemented (4 feature selection methods x 5 datasets x 5 classification algorithms). From the experiments, it is observed that the SFP models with proposed feature selection technique of evolving populations with mathematical diversification (FS-EPwMD) are better than other models. It can be concluded that the proposed SFP model built using proposed FS-EPwMD with artificial neural networks performs statistically best among all the competing 100 SFP models irrespective of the datasets used.
引用
收藏
页码:13999 / 14020
页数:22
相关论文
共 58 条
[1]   The Arithmetic Optimization Algorithm [J].
Abualigah, Laith ;
Diabat, Ali ;
Mirjalili, Seyedali ;
Elaziz, Mohamed Abd ;
Gandomi, Amir H. .
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 376
[2]  
Afzal W, 2016, STUD COMPUT INTELL, V617, P33, DOI 10.1007/978-3-319-25964-2_3
[3]   Predict the Value of Football Players Using FIFA Video Game Data and Machine Learning Techniques [J].
Al-Asadi, Mustafa A. ;
Tasdemir, Sakir .
IEEE ACCESS, 2022, 10 :22631-22645
[4]   Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System [J].
Al-Asadi, Mustafa A. ;
Tasdemir, Sakir .
IEEE ACCESS, 2021, 9 :149266-149286
[5]  
Ali A., 2020, PHARM FORMULATION DE, P1, DOI DOI 10.5772/INTECHOPEN.90738
[6]   Feature selection using firefly algorithm in software defect prediction [J].
Anbu, M. ;
Mala, G. S. Anandha .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5) :10925-10934
[7]  
[Anonymous], 2015, J. Softw. Eng.
[8]   Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach [J].
Balogun, Abdullateef Oluwagbemiga ;
Basri, Shuib ;
Abdulkadir, Said Jadid ;
Hashim, Ahmad Sobri .
APPLIED SCIENCES-BASEL, 2019, 9 (13)
[9]   Software fault prediction: A literature review and current trends [J].
Catal, Cagatay .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) :4626-4636
[10]   Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem [J].
Catal, Cagatay ;
Diri, Banu .
INFORMATION SCIENCES, 2009, 179 (08) :1040-1058