Comparing and experimenting machine learning techniques for code smell detection

被引:257
|
作者
Fontana, Francesca Arcelli [1 ]
Mantyla, Mika V. [4 ,5 ]
Zanoni, Marco [2 ]
Marino, Alessandro [3 ]
机构
[1] Univ Milano Bicocca, Dept Comp Sci, Milan, Italy
[2] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[3] Univ Milano Bicocca, Milan, Italy
[4] Univ Oulu, Software Engn, Oulu, Finland
[5] Aalto Univ, Helsinki, Finland
关键词
Code smells detection; Machine learning techniques; Benchmark for code smell detection; BAD SMELLS; QUALITY; CLASSIFICATION;
D O I
10.1007/s10664-015-9378-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.
引用
收藏
页码:1143 / 1191
页数:49
相关论文
共 50 条
  • [21] Comparison of machine learning techniques for spam detection
    Ghosh, Argha
    Senthilrajan, A.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29227 - 29254
  • [22] A Review of Machine Learning Techniques in Cyberbullying Detection
    Sultan, Daniyar
    Omarov, Batyrkhan
    Kozhamkulova, Zhazira
    Kazbekova, Gulnur
    Alimzhanova, Laura
    Dautbayeva, Aigul
    Zholdassov, Yernar
    Abdrakhmanov, Rustam
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 5625 - 5640
  • [23] Anomaly Detection using Machine Learning Techniques
    Wankhede, Sonali B.
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [24] Comparison of machine learning techniques for target detection
    Vink, Jelte Peter
    de Haan, Gerard
    ARTIFICIAL INTELLIGENCE REVIEW, 2015, 43 (01) : 125 - 139
  • [25] Comparison of machine learning techniques for spam detection
    Argha Ghosh
    A. Senthilrajan
    Multimedia Tools and Applications, 2023, 82 : 29227 - 29254
  • [26] Code smell detection using multi-label classification approach
    Thirupathi Guggulothu
    Salman Abdul Moiz
    Software Quality Journal, 2020, 28 : 1063 - 1086
  • [27] Code smell detection using multi-label classification approach
    Guggulothu, Thirupathi
    Moiz, Salman Abdul
    SOFTWARE QUALITY JOURNAL, 2020, 28 (03) : 1063 - 1086
  • [28] Code Clones Detection Using Machine Learning Technique: Support Vector Machine
    Jadon, Shruti
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 299 - 303
  • [29] A Novel Approach for Code Smell Detection: An Empirical Study
    Dewangan, Seema
    Rao, Rajwant Singh
    Mishra, Alok
    Gupta, Manjari
    IEEE ACCESS, 2021, 9 (09): : 162869 - 162883
  • [30] Comparing Methods for Multilabel Classification of Proteins Using Machine Learning Techniques
    Cerri, Ricardo
    da Silva, Renato R. O.
    de Carvalho, Andre C. P. L. F.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5676 : 109 - 120