Software fault prediction using evolving populations with mathematical diversification

被引:2
作者
Goyal, Somya [1 ]
机构
[1] Manipal Univ Jaipur, Jaipur 303007, Rajasthan, India
关键词
Software fault prediction (SFP); Feature selection (FS); Search-based software engineering (SBSE); Genetic evolution (GE); Mathematical operator algorithm; Artificial neural network (ANN); DEFECT PREDICTION; FEATURE-SELECTION; OPTIMIZATION; METRICS; ALGORITHM; QUALITY;
D O I
10.1007/s00500-022-07445-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software fault prediction (SFP) plays a vital role into fostering high quality throughout the software development process. It allows to identify the fault-prone modules in early development phases and facilitates the focused and effective testing over the fault-prone modules. Machine learning (ML)-based classifiers are prominently being used for fault prediction in the software industry. The accuracy of the ML models depends upon the training data and its quality. The curse of high dimensionality adversely impacts the classification power of a ML model. The presence of inter-correlated, insignificant and/or redundant features (or attributes) in the training data hinders the performance of ML classifiers. Feature preprocessing (or feature selection (FS)) is the solution to this issue. Meta-heuristics is the key method to find out the most significant feature subset. In this paper, a novel feature selection method is devised using mathematical diversification for genetic evolution. It avoids the local optimums by utilizing arithmetic diversification among the candidate solutions (or populations). The survival of fittest is the working principle of evolving populations with crossover and mutation operations. The selected feature subset is fed to five classification algorithms, namely artificial neural network, support vector machine, decision tree, k-nearest neighbor and naive Bayes. The proposed model is trained and tested over five datasets from NASA corpus, namely CM1, JM1, KC1, KC2 and PC1. In total, 100 SFP models are implemented (4 feature selection methods x 5 datasets x 5 classification algorithms). From the experiments, it is observed that the SFP models with proposed feature selection technique of evolving populations with mathematical diversification (FS-EPwMD) are better than other models. It can be concluded that the proposed SFP model built using proposed FS-EPwMD with artificial neural networks performs statistically best among all the competing 100 SFP models irrespective of the datasets used.
引用
收藏
页码:13999 / 14020
页数:22
相关论文
共 58 条
[31]   An empirical study on software defect prediction with a simplified metric set [J].
He, Peng ;
Li, Bing ;
Liu, Xiao ;
Chen, Jun ;
Ma, Yutao .
INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 59 :170-190
[32]   A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction [J].
Hosseini, Seyedrebvar ;
Turhan, Burak ;
Mantyla, Mika .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 95 :296-312
[33]   A Systematic Literature Review and Meta-analysis on Cross Project Defect Prediction [J].
Hosseini, Seyedrebvar ;
Turhan, Burak ;
Gunarathna, Dimuthu .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (02) :111-147
[34]   The Impact of Correlated Metrics on the Interpretation of Defect Models [J].
Jiarpakdee, Jirayus ;
Tantithamthavorn, Chakkrit ;
Hassan, Ahmed E. .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (02) :320-331
[35]   A practical classification-rule for software-quality models [J].
Khoshgoftaar, TM ;
Allen, EB .
IEEE TRANSACTIONS ON RELIABILITY, 2000, 49 (02) :209-216
[36]   The impact of feature reduction techniques on defect prediction models [J].
Kondo, Masanari ;
Bezemer, Cor-Paul ;
Kamei, Yasutaka ;
Hassan, Ahmed E. ;
Mizuno, Osamu .
EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (04) :1925-1963
[37]   Progress on approaches to software defect prediction [J].
Li, Zhiqiang ;
Jing, Xiao-Yuan ;
Zhu, Xiaoke .
IET SOFTWARE, 2018, 12 (03) :161-175
[38]   Particle swarm optimization for parameter determination and feature selection of support vector machines [J].
Lin, Shih-Wei ;
Ying, Kuo-Ching ;
Chen, Shih-Chieh ;
Lee, Zne-Jung .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (04) :1817-1824
[39]   Evolutionary Optimization of Software Quality Modeling with Multiple Repositories [J].
Liu, Yi ;
Khoshgoftaar, Taghi M. ;
Seliya, Naeem .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2010, 36 (06) :852-864
[40]   Whale optimization approaches for wrapper feature selection [J].
Mafarja, Majdi ;
Mirjalili, Seyedali .
APPLIED SOFT COMPUTING, 2018, 62 :441-453