Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning

被引:49
|
作者
Mafarja, Majdi [1 ]
Thaher, Thaer [2 ,3 ]
Al-Betar, Mohammed Azmi [4 ]
Too, Jingwei [5 ]
Awadallah, Mohammed A. [6 ,7 ]
Abu Doush, Iyad [8 ,9 ]
Turabieh, Hamza [10 ]
机构
[1] Birzeit Univ, Dept Comp Sci, Birzeit, Palestine
[2] Arab Amer Univ, Dept Comp Syst Engn, Jenin, Palestine
[3] Al Quds Univ, Informat Technol Engn, Jerusalem, Palestine
[4] Ajman Univ, Coll Engn & Informat Technol, Artificial Intelligence Res Ctr AIRC, Irbid, Jordan
[5] Univ Teknikal Malaysia Melaka, Fac Elect Engn, Durian Tunggal 76100, Melaka, Malaysia
[6] Al Aqsa Univ, Dept Comp Sci, POB 4051, Gaza, Palestine
[7] Ajman Univ, Artificial Intelligence Res Ctr AIRC, Ajman, U Arab Emirates
[8] Amer Univ Kuwait, Coll Engn & Appl Sci, Dept Comp, Salmiya, Kuwait
[9] Yarmouk Univ, Comp Sci Dept, Irbid, Jordan
[10] Univ Missouri, Dept Hlth Management & Informat, 5 Hosp Dr, Columbia, MO 65212 USA
关键词
Software fault prediction; Machine learning; SMOTE; Dimension reduction; Meta-heuristics; Imbalanced data; ALGORITHM; PREDICTION; METRICS; IDENTIFICATION; SYSTEM; MODEL;
D O I
10.1007/s10489-022-04427-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms' performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
引用
收藏
页码:18715 / 18757
页数:43
相关论文
共 14 条
  • [1] Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning
    Majdi Mafarja
    Thaer Thaher
    Mohammed Azmi Al-Betar
    Jingwei Too
    Mohammed A. Awadallah
    Iyad Abu Doush
    Hamza Turabieh
    Applied Intelligence, 2023, 53 : 18715 - 18757
  • [2] An adaptive and enhanced framework for daily stock market prediction using feature selection and ensemble learning algorithms
    Sivri, Mahmut Sami
    Ustundag, Alp
    JOURNAL OF BUSINESS ANALYTICS, 2024, 7 (01) : 42 - 62
  • [3] An Enhanced Evolutionary Based Feature Selection Approach Using Grey Wolf Optimizer for the Classification of High-dimensional Biological Data
    Thaher, Thaer
    Awad, Mohammed
    Aldasht, Mohammed
    Sheta, Alaa
    Turabieh, Hamza
    Chantar, Hamouda
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2022, 28 (05) : 499 - 539
  • [4] Classification of lung cancer using ensemble-based feature selection and machine learning methods
    Cai, Zhihua
    Xu, Dong
    Zhang, Qing
    Zhang, Jiexia
    Ngai, Sai-Ming
    Shao, Jianlin
    MOLECULAR BIOSYSTEMS, 2015, 11 (03) : 791 - 800
  • [5] Effective Prediction of Software Defects using Random-tree Entropy based Feature Selection Framework
    Alhumam, Abdulaziz
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 348 - 354
  • [6] An ensemble deep learning framework for energy demand forecasting using genetic algorithm-based feature selection
    Sakib, Mohd
    Siddiqui, Tamanna
    Mustajab, Suhel
    Alotaibi, Reemiah Muneer
    Alshareef, Nouf Mohammad
    Khan, Mohammad Zunnun
    PLOS ONE, 2025, 20 (01):
  • [7] Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets
    Alrefai, Nashat
    Ibrahim, Othman
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16) : 13513 - 13528
  • [8] A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning
    Junyao Ding
    Jianchao Du
    Hejie Wang
    Song Xiao
    Scientific Reports, 15 (1)
  • [9] Enhancing the Security of SDN in 5G: A Hybrid Feature Selection Based Ensemble Machine Learning Framework for Classification of Cyber-Attacks
    Mahendra Pratap Singh
    Virendra Pratap Haimashreelakshmi
    Maanak Singh
    undefined Gupta
    SN Computer Science, 6 (3)
  • [10] A Feature Selection Based on Improved Artificial Hummingbird Algorithm Using Random Opposition-Based Learning for Solving Waste Classification Problem
    Ali, Mona A. S.
    Rajeena, Fathimathul P. P.
    Abd Elminaam, Diaa Salama
    MATHEMATICS, 2022, 10 (15)