Automatic detection of Long Method and God Class code smells through neural source code embeddings

被引:29
|
作者
Kovacevic, Aleksandar [1 ]
Slivka, Jelena [1 ]
Vidakovic, Dragan [1 ]
Grujic, Katarina-Glorija [1 ]
Luburic, Nikola [1 ]
Prokic, Simona [1 ]
Sladic, Goran [1 ]
机构
[1] Univ Novi Sad, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia
关键词
Code smell detection; Neural source code embeddings; Code metrics; Machine learning; Software engineering; IMPACT;
D O I
10.1016/j.eswa.2022.117607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but researchers have recently adopted a Machine-Learning (ML) based approach. This paper compares the performance of multiple ML-based code smell detection models against multiple metric-based heuristics for detection of God Class and Long Method code smells. We assess the effectiveness of different source code representations for ML: we evaluate the effectiveness of traditionally used code metrics against code embeddings (code2vec, code2seq, and CuBERT). This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. This approach helped us leverage the power of transfer learning - our study is the first to explore whether the knowledge mined from code understanding models can be transferred to code smell detection. A secondary contribution of our research is the systematic evaluation of the effectiveness of code smell detection approaches on the same large-scale, manually labeled MLCQ dataset. Almost every study that proposes a detection approach tests this approach on the dataset unique for the study. Consequently, we cannot directly compare the reported performances to derive the bestperforming approach.
引用
收藏
页数:18
相关论文
共 10 条
  • [1] Automatic detection of Feature Envy and Data Class code smells using machine learning
    Skipina, Milica
    Slivka, Jelena
    Luburic, Nikola
    Kovacevic, Aleksandar
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [2] Automatic Human-Like Detection of Code Smells
    Soomlek, Chitsutha
    van Rijn, Jan N.
    Bonsangue, Marcello M.
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 19 - 28
  • [3] Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#
    Kovacevic, Aleksandar
    Luburic, Nikola
    Slivka, Jelena
    Prokic, Simona
    Grujic, Katarina-Glorija
    Vidakovic, Dragan
    Sladic, Goran
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (16): : 9203 - 9220
  • [4] Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#
    Aleksandar Kovačević
    Nikola Luburić
    Jelena Slivka
    Simona Prokić
    Katarina-Glorija Grujić
    Dragan Vidaković
    Goran Sladić
    Neural Computing and Applications, 2024, 36 : 9203 - 9220
  • [5] Are all Code Smells Harmful? A Study of God Classes and Brain Classes in the Evolution of three Open Source Systems
    Olbrich, Steffen M.
    Cruzes, Daniela S.
    Sjoberg, Dag I. K.
    2010 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2010,
  • [6] Refactoring Opportunity Identification Methodology for Removing Long Method Smells and Improving Code Analyzability
    Meananeatra, Panita
    Rongviriyapanish, Songsakdi
    Apiwattanapong, Taweesup
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (07): : 1766 - 1779
  • [7] Interpretation of Learning-Based Automatic Source Code Vulnerability Detection Model Using LIME
    Tang, Gaigai
    Zhang, Long
    Yang, Feng
    Meng, Lianxiao
    Cao, Weipeng
    Qiu, Meikang
    Ren, Shuangyin
    Yang, Lin
    Wang, Huiqiang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 275 - 286
  • [8] A Novel Source Code Clone Detection Method Based on Dual-GCN and IVHFS
    Yang, Haixin
    Li, Zhen
    Guo, Xinyu
    ELECTRONICS, 2023, 12 (06)
  • [9] Project Achilles: A Prototype Tool for Static Method-Level Vulnerability Detection of Java']Java Source Code Using a Recurrent Neural Network
    Saccente, Nicholas
    Dehlinger, Josh
    Deng, Lin
    Chakraborty, Suranjan
    Xiong, Yin
    2019 34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2019), 2019, : 114 - 121
  • [10] A New Smart Contract Anomaly Detection Method by Fusing Opcode and Source Code Features for Blockchain Services
    Duan, Li
    Yang, Liu
    Liu, Chunhong
    Ni, Wei
    Wang, Wei
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (04): : 4354 - 4368