An Empirical Study of Code Smells in Transformer-based Code Generation Techniques

被引：25

作者：

Siddiq, Mohammed Latif ^{[1
]}

Majumder, Shafayat H. ^{[2
]}

Mim, Maisha R. ^{[2
]}

Jajodia, Sourov ^{[2
]}

Santos, Joanna C. S. ^{[1
]}

机构：

[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

[2] Bangladesh Univ Engn & Technol, Dept Comp Sci, Dhaka, Bangladesh

来源：

2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022) | 2022年

关键词：

code generation; code smell; security smell; transformer; pre-trained model; GitHub copilot; SOFTWARE;

D O I：

10.1109/SCAM55253.2022.00014

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Prior works have developed transformer-based language learning models to automatically generate source code for a task without compilation errors. The datasets used to train these techniques include samples from open source projects which may not be free of security flaws, code smells, and violations of standard coding practices. Therefore, we investigate to what extent code smells are present in the datasets of coding generation techniques and verify whether they leak into the output of these techniques. To conduct this study, we used Pylint and Bandit to detect code smells and security smells in three widely used training sets (CodeXGlue, APPS, and Code Clippy). We observed that Pylint caught 264 code smell types, whereas Bandit located 44 security smell types in these three datasets used for training code generation techniques. By analyzing the output from ten different configurations of the open-source finetuned transformer-based GPT-Neo 125M parameters model, we observed that this model leaked the smells and non-standard practices to the generated source code. When analyzing GitHub Copilot's suggestions, a closed source code generation tool, we observed that it contained 18 types of code smells, including substandard coding patterns and 2 security smell types.

引用

页码：71 / 82

页数：12

共 50 条

[1] From Fine-tuning to Output: An Empirical Investigation of Test Smells in Transformer-Based Test Code Generation
Aljohani, Ahmed
Do, Hyunsook
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1282 - 1291
[2] SeTransformer: A Transformer-Based Code Semantic Parser for Code Comment Generation
Li, Zheng
Wu, Yonghao
Peng, Bin
Chen, Xiang
Sun, Zeyu
Liu, Yong
Paul, Doyle
IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (01) : 258 - 273
[3] Are architectural smells independent from code smells? An empirical study
Fontana, Francesca Arcelli
Lenarduzzi, Valentina
Roveda, Riccardo
Taibi, Davide
JOURNAL OF SYSTEMS AND SOFTWARE, 2019, 154 : 139 - 156
[4] Empirical Study on Code Smells in iOS Applications
Rahkema, Kristiina
Pfahl, Dietmar
2020 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS, MOBILESOFT, 2020, : 61 - 65
[5] An Empirical Study of the Performance Impacts of Android Code Smells
Hecht, Geoffrey
Moha, Naouel
Rouvoy, Romain
2016 IEEE/ACM INTERNATIONAL CONFERENCE ON MOBILE SOFTWARE ENGINEERING AND SYSTEMS (MOBILESOFT 2016), 2016, : 59 - 69
[6] An empirical study of Android behavioural code smells detection
Prestat, Dimitri
Moha, Naouel
Villemaire, Roger
EMPIRICAL SOFTWARE ENGINEERING, 2022, 27 (07)
[7] An empirical study of Android behavioural code smells detection
Dimitri Prestat
Naouel Moha
Roger Villemaire
Empirical Software Engineering, 2022, 27
[8] An Empirical Study of Code Smells in Java']JavaScript Projects
Saboury, Amir
Musavi, Pooya
Khomh, Foutse
Antoniol, Giulio
2017 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), 2017, : 294 - 305
[9] A taxonomy and an initial empirical study of bad smells in code
Mäntylä, M
Vanhanen, J
Lassenius, C
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 2003, : 381 - 384
[10] ALSI-Transformer: Transformer-Based Code Comment Generation With Aligned Lexical and Syntactic Information
Park, Youngmi
Park, Ahjeong
Kim, Chulyun
IEEE ACCESS, 2023, 11 : 39037 - 39047

← 1 2 3 4 5 →