Depth linear discrimination-oriented feature selection method based on adaptive sine cosine algorithm for software defect prediction

被引:4
作者
Nasser, Abdullah B. [1 ,5 ]
Ghanem, Waheed Ali H. M. [2 ,6 ]
Saad, Abdul-Malik H. Y. [3 ,5 ]
Abdul-Qawy, Antar Shaddad Hamed [7 ]
Ghaleb, Sanaa A. A. [4 ,6 ]
Alduais, Nayef Abdulwahab Mohammed [8 ]
Din, Fakhrud [9 ]
Ghetas, Mohamed [10 ]
机构
[1] Univ Vaasa, Sch Technol & Innovat, Vaasa, Finland
[2] Univ Malaysia Terengganu, Fac Comp Sci & Math, Kuala Terengganu, Malaysia
[3] Univ Buraimi, Coll Engn, Buraimi, Oman
[4] Univ Sultan Zainal Abidin, Fac Informat & Comp, Kuala Terengganu, Terengganu, Malaysia
[5] Hodeidah Univ, Fac Comp Sci & Engn, Al Hudaydah, Yemen
[6] Univ Aden, Fac Engn, Aden, Yemen
[7] Abdulrahman Al Sumait Univ, Fac Sci, Zanzibar, Tanzania
[8] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Parit Raja Batu Pahat, Johor, Malaysia
[9] Univ Malakand, Dept Comp Sci & IT, Totakan, Pakistan
[10] Galala Univ, Dept Comp Sci, Suez, Egypt
关键词
Software defect prediction; Machine learning; Feature selection; Metaheuristic algorithms; Sine cosine algorithm; Linear discriminant analysis; OPTIMIZATION; OPERATORS;
D O I
10.1016/j.eswa.2024.124266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software Defect Prediction (SDP) plays a vital role in the software development life cycle as it helps identify and fix software defects. However, predicting software defects with irrelevant features and overlapping classes is challenging and can lead to lengthy training and low model accuracy. To address these challenges, this research introduces a novel Depth Linear Discrimination-Oriented Feature Selection Method based on Adaptive Sine Cosine Algorithm, named Depth Adaptive Sine Cosine Feature Selection (DASC-FS). DASC-FS integrates the Adaptive Sine Cosine Algorithm (ASCA) as a search algorithm to determine the relevant features and adopts Depth Linear Discriminant Analysis (D-LDA) to identify the discriminative features that maximize class separation. The paper proposes ASCA which is a metaheuristic algorithm meticulously designed to enhance the search capabilities of the standard Sine Cosine Algorithm (SCA). Combining the simplicity of the SCA with the efficiency of multiple mutation operators inspired by Genetic Algorithms (GA), ASCA enhances the diversity of the solutions and imparts remarkable adaptability to various situations. Furthermore, this study introduces a novel linear discriminant method, called Depth Linear Discriminant Analysis (D-LDA) to enhance the robustness of the original LDA. D-LDA systematically integrates the matrix depth concept into LDA, offering a systematic approach to address the challenges associated with scatter matrix estimation. As matrix depth measures how central or deep a particular matrix is within a distribution with respect to different directions, it is an efficient tool for computing a robust scatter matrix estimator that can handle outliers and complex data structures. The experimental results showed that DASC-FS consistently obtains the highest accuracy compared to most existing methods by integrating ASCA and D-LDA, thereby considering both accuracy optimization and class separation. The results also show that the use of multiple mutation operators in ASCA improves the search process capabilities. The results also show that the capacity of D-LDA to reduce data dimensionality and increase class separation yields highly competitive results compared to other LDAs. Finally, features related to code size and complexity have emerged as key factors for SDP because they consistently rank as important features across different classifiers and datasets. DASC-FS offers a valuable solution in domain knowledge for enhancing predictive accuracy and understanding factors contributing to software defects through enhanced search capa- bilities, robust scatter matrix estimation, and the ability to reduce data dimensionality.
引用
收藏
页数:27
相关论文
共 67 条
[1]   A Novel Tournament Selection Based Differential Evolution Variant for Continuous Optimization Problems [J].
Abbas, Qamar ;
Ahmad, Jamil ;
Jabeen, Hajira .
MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
[2]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[3]   Advances in Sine Cosine Algorithm: A comprehensive survey [J].
Abualigah, Laith ;
Diabat, Ali .
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) :2567-2608
[4]  
Adorada A., 2020, 2020 4 INT C INF COM
[5]   Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019) [J].
Agrawal, Prachi ;
Abutarboush, Hattan F. ;
Ganesh, Talari ;
Mohamed, Ali Wagdy .
IEEE ACCESS, 2021, 9 :26766-26791
[6]   Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning [J].
Ali, Misbah ;
Mazhar, Tehseen ;
Al-Rasheed, Amal ;
Shahzad, Tariq ;
Ghadi, Yazeed Yasin ;
Khan, Muhammad Amir .
PEERJ COMPUTER SCIENCE, 2024, 10
[7]  
Alkhasawneh M. S, 2022, Applied Computational Intelligence and Soft Computing: Software defect prediction through neural network and feature selections
[8]   Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier [J].
Alsghaier, Hiba ;
Akour, Mohammed .
SOFTWARE-PRACTICE & EXPERIENCE, 2020, 50 (04) :407-427
[9]   Feature selection using firefly algorithm in software defect prediction [J].
Anbu, M. ;
Mala, G. S. Anandha .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 5) :10925-10934
[10]   A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction [J].
Balogun, Abdullateef O. ;
Basri, Shuib ;
Mahamad, Saipunidzam ;
Capretz, Luiz Fernando ;
Imam, Abdullahi Abubakar ;
Almomani, Malek A. ;
Adeyemo, Victor E. ;
Kumar, Ganesh .
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021