Depth linear discrimination-oriented feature selection method based on adaptive sine cosine algorithm for software defect prediction

被引:4
作者
Nasser, Abdullah B. [1 ,5 ]
Ghanem, Waheed Ali H. M. [2 ,6 ]
Saad, Abdul-Malik H. Y. [3 ,5 ]
Abdul-Qawy, Antar Shaddad Hamed [7 ]
Ghaleb, Sanaa A. A. [4 ,6 ]
Alduais, Nayef Abdulwahab Mohammed [8 ]
Din, Fakhrud [9 ]
Ghetas, Mohamed [10 ]
机构
[1] Univ Vaasa, Sch Technol & Innovat, Vaasa, Finland
[2] Univ Malaysia Terengganu, Fac Comp Sci & Math, Kuala Terengganu, Malaysia
[3] Univ Buraimi, Coll Engn, Buraimi, Oman
[4] Univ Sultan Zainal Abidin, Fac Informat & Comp, Kuala Terengganu, Terengganu, Malaysia
[5] Hodeidah Univ, Fac Comp Sci & Engn, Al Hudaydah, Yemen
[6] Univ Aden, Fac Engn, Aden, Yemen
[7] Abdulrahman Al Sumait Univ, Fac Sci, Zanzibar, Tanzania
[8] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Parit Raja Batu Pahat, Johor, Malaysia
[9] Univ Malakand, Dept Comp Sci & IT, Totakan, Pakistan
[10] Galala Univ, Dept Comp Sci, Suez, Egypt
关键词
Software defect prediction; Machine learning; Feature selection; Metaheuristic algorithms; Sine cosine algorithm; Linear discriminant analysis; OPTIMIZATION; OPERATORS;
D O I
10.1016/j.eswa.2024.124266
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software Defect Prediction (SDP) plays a vital role in the software development life cycle as it helps identify and fix software defects. However, predicting software defects with irrelevant features and overlapping classes is challenging and can lead to lengthy training and low model accuracy. To address these challenges, this research introduces a novel Depth Linear Discrimination-Oriented Feature Selection Method based on Adaptive Sine Cosine Algorithm, named Depth Adaptive Sine Cosine Feature Selection (DASC-FS). DASC-FS integrates the Adaptive Sine Cosine Algorithm (ASCA) as a search algorithm to determine the relevant features and adopts Depth Linear Discriminant Analysis (D-LDA) to identify the discriminative features that maximize class separation. The paper proposes ASCA which is a metaheuristic algorithm meticulously designed to enhance the search capabilities of the standard Sine Cosine Algorithm (SCA). Combining the simplicity of the SCA with the efficiency of multiple mutation operators inspired by Genetic Algorithms (GA), ASCA enhances the diversity of the solutions and imparts remarkable adaptability to various situations. Furthermore, this study introduces a novel linear discriminant method, called Depth Linear Discriminant Analysis (D-LDA) to enhance the robustness of the original LDA. D-LDA systematically integrates the matrix depth concept into LDA, offering a systematic approach to address the challenges associated with scatter matrix estimation. As matrix depth measures how central or deep a particular matrix is within a distribution with respect to different directions, it is an efficient tool for computing a robust scatter matrix estimator that can handle outliers and complex data structures. The experimental results showed that DASC-FS consistently obtains the highest accuracy compared to most existing methods by integrating ASCA and D-LDA, thereby considering both accuracy optimization and class separation. The results also show that the use of multiple mutation operators in ASCA improves the search process capabilities. The results also show that the capacity of D-LDA to reduce data dimensionality and increase class separation yields highly competitive results compared to other LDAs. Finally, features related to code size and complexity have emerged as key factors for SDP because they consistently rank as important features across different classifiers and datasets. DASC-FS offers a valuable solution in domain knowledge for enhancing predictive accuracy and understanding factors contributing to software defects through enhanced search capa- bilities, robust scatter matrix estimation, and the ability to reduce data dimensionality.
引用
收藏
页数:27
相关论文
共 67 条
[11]   Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study [J].
Balogun, Abdullateef O. ;
Basri, Shuib ;
Mahamad, Saipunidzam ;
Abdulkadir, Said J. ;
Almomani, Malek A. ;
Adeyemo, Victor E. ;
Al-Tashi, Qasem ;
Mojeed, Hammed A. ;
Imam, Abdullahi A. ;
Bajeh, Amos O. .
SYMMETRY-BASEL, 2020, 12 (07)
[12]   Efficiency of oversampling methods for enhancing software defect prediction by using imbalanced data [J].
Benala, Tirimula Rao ;
Tantati, Karunya .
INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023, 19 (03) :247-263
[13]   Tackling class overlap and imbalance problems in software defect prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
SOFTWARE QUALITY JOURNAL, 2018, 26 (01) :97-125
[14]   ROBUST COVARIANCE AND SCATTER MATRIX ESTIMATION UNDER HUBER'S CONTAMINATION MODEL [J].
Chen, Mengjie ;
Gao, Chao ;
Ren, Zhao .
ANNALS OF STATISTICS, 2018, 46 (05) :1932-1960
[15]   An analysis of the factors determining software product quality: A comparative study [J].
Curcio, Karina ;
Malucelli, Andreia ;
Reinehr, Sheila ;
Paludo, Marco Antonio .
COMPUTER STANDARDS & INTERFACES, 2016, 48 :10-18
[16]   Feature Selection Using Golden Jackal Optimization for Software Fault Prediction [J].
Das, Himansu ;
Prajapati, Sanjay ;
Gourisaria, Mahendra Kumar ;
Pattanayak, Radha Mohan ;
Alameen, Abdalla ;
Kolhar, Manjur .
MATHEMATICS, 2023, 11 (11)
[17]  
De Falco I., 2002, Applied Soft Computing, V1, P285, DOI 10.1016/S1568-4946(02)00021-2
[18]  
De G., 1989, Optimization, and machine learning
[19]  
Deb K., 2011, Multi-objective optimisation using evolutionary algorithms: An introduction, P3
[20]   A comprehensive survey on recent metaheuristics for feature selection [J].
Dokeroglu, Tansel ;
Deniz, Ayca ;
Kiziloz, Hakan Ezgi .
NEUROCOMPUTING, 2022, 494 :269-296