Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection

被引:38
|
作者
Jain, Shivani [1 ]
Saha, Anju [1 ]
机构
[1] GGS Indraprastha Univ, USIC&T, Sect 16 C, Delhi 110078, India
关键词
Code smell; Machine learning; Ensemble machine learning; Hybrid feature selection; Stacking; CLASSIFIER; REGRESSION; DESIGN;
D O I
10.1016/j.scico.2021.102713
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Maintaining large and complex software is a significant task in IT industry. One reason for that is the development of code smells which are design flaws that lead to future bugs and errors. Code smells can be treated with regular refactoring, and their detection is the first step in the software maintenance process. Detecting code smells with machine learning algorithms eliminate the need of extensive knowledge required regarding properties of code smell and threshold values. Ensemble machine learning algorithms use a combination of several same or different classifiers to further aid the performance and reduces the variance. In our study, three hybrid feature selection techniques with ensemble machine learning algorithms are employed to improve the performance in detecting code smells. Seven machine learning classifiers with different kernel variations, along with three boosting designs, two stacking methods, and bagging were implemented. For feature selection, combination of filter-wrapper, filter-embedded, and wrapper-embedded methods have been executed. Performance measures for detecting four code smells are evaluated and are compared with the performance when feature selection is not employed. It is found out that performance measure after application of hybrid feature selection increased, accuracy by 21.43%, AUC value by 53.24%, and f-measure by 76.06%. Univariate ROC with Lasso is the best hybrid feature selection technique with 90.48% accuracy and 94.5% ROC AUC value. Random Forest and Logistic regression are the best performing machine learning classifiers. Data class is most detectable code smell. Stacking always gave better results when compared with individual classifiers. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] Improving Machine Learning Models for Malware Detection Using Embedded Feature Selection Method
    Chemmakha, Mohammed
    Habibi, Omar
    Lazaar, Mohamed
    IFAC PAPERSONLINE, 2022, 55 (12): : 771 - 776
  • [42] Hybrid Ensemble Based Machine Learning for Smart Building Fire Detection Using Multi Modal Sensor Data
    Jana, Sandip
    Shome, Saikat Kumar
    FIRE TECHNOLOGY, 2023, 59 (02) : 473 - 496
  • [43] Network Intrusion Detection with Two-Phased Hybrid Ensemble Learning and Automatic Feature Selection
    Mananayaka, Asanka Kavinda
    Chung, Sun Sunnie
    IEEE ACCESS, 2023, 11 : 45154 - 45167
  • [44] Reviewing various feature selection techniques in machine learning-based botnet detection
    Baruah, Sangita
    Borah, Dhruba Jyoti
    Deka, Vaskar
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12)
  • [45] Improving Alzheimer's Disease Prediction with Different Machine Learning Approaches and Feature Selection Techniques
    Alshamlan, Hala
    Alwassel, Arwa
    Banafa, Atheer
    Alsaleem, Layan
    DIAGNOSTICS, 2024, 14 (19)
  • [46] Code change and smell techniques for regression test selection
    Mori, Allan
    Paiva, Ana C. R.
    Souza, Simone R. S.
    SOFTWARE QUALITY JOURNAL, 2025, 33 (01)
  • [47] Enhancing malware detection with feature selection and scaling techniques using machine learning models
    Hasan, Rakibul
    Biswas, Barna
    Samiun, Md
    Saleh, Mohammad Abu
    Prabha, Mani
    Akter, Jahanara
    Joya, Fatema Haque
    Abdullah, Masuk
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [48] IDENTIFICATION OF CODE SMELL USING MACHINE LEARNING
    Jesudoss, A.
    Maneesha, S.
    durga, T. Lakshmi naga
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 54 - 58
  • [49] Comparison of Machine Learning Methods for Code Smell Detection Using Reduced Features
    Karaduzovic-Hadziabdic, Kanita
    Spahic, Rialda
    2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 670 - 672
  • [50] A Machine Learning Method with Hybrid Feature Selection for Improved Credit Card Fraud Detection
    Mienye, Ibomoiye Domor
    Sun, Yanxia
    APPLIED SCIENCES-BASEL, 2023, 13 (12):