Multilingual code refactoring detection based on deep learning

被引：0

作者：

Li, Tao ^{[1
]}

Zhang, Yang ^{[1
]}

机构：

[1] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 258卷

关键词：

Refactoring detection; Deep learning; Code change; Multilingual code; Edit sequence;

D O I：

10.1016/j.eswa.2024.125164

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Refactoring is a critical process of improving the internal structure of the source code without altering its external behavior. Existing deep learning-based refactoring detection relies on commit messages to extract features. However, these commit messages are not trustful enough since some developers do not consistently record refactoring activities. Furthermore, current approaches are designed for a single programming language and lack multilingual refactoring support. To this end, this paper proposes RefT5, , a multilingual code refactoring detection approach based on deep learning. Firstly, we select 110 real-world projects with Java and Python programming languages as a corpus to construct the dataset. Secondly, we extract features including commit messages, code changes, and refactoring types from these projects. RefT5 generates edit sequences from code changes and takes refactoring types as labels. Thirdly, we employ CodeT5 and BiLSTM-attention to extract semantic and structural features and generate feature vectors. Finally, the feature vectors are input into a classification layer to detect the refactoring type. The experimental results show that RefT5 obtains 98.05% precision and 97.77% recall. Furthermore, compared with existing approaches, it improves precision by 51.61% and recall by 52.9% on average, demonstrating its effectiveness.

引用

页数：10

共 56 条

[1] Can Refactoring be Self-Affirmed? An Exploratory Study on How Developers Document their Refactoring Activities in Commit Messages
AlOmar, Eman Abdullah
Mkaouer, Mohamed Wiem
Ouni, Ali
[J]. 2019 IEEE/ACM 3RD INTERNATIONAL WORKSHOP ON REFACTORING (IWOR 2019), 2019, : 51 - 58
[2] The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring
Aniche, Mauricio
Maziero, Erick
Durelli, Rafael
Durelli, Vinicius H. S.
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1432 - 1450
[3] SATT: Tailoring Code Metric Thresholds for Different Software Architectures
Aniche, Mauricio
Treude, Christoph
Zaidman, Andy
van Deursen, Arie
Gerosa, Marco Aurelio
[J]. 2016 IEEE 16TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2016, : 41 - 50
[4] PYREF: Refactoring Detection in Python']Python Projects
Atwi, Hassan
Lin, Bin
Tsantalis, Nikolaos
Kashiwa, Yutaro
Kamei, Yasutaka
Ubayashi, Naoyasu
Bavota, Gabriele
Lanza, Michele
[J]. IEEE 21ST INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2021), 2021, : 136 - 141
[5] Charbuty B., 2021, J. Appl. Sci. Technol. Trends, V2, P20, DOI DOI 10.38094/JASTT20165
[6] A novel selective naive Bayes algorithm
Chen, Shenglei
Webb, Geoffrey I.
Liu, Linyuan
Ma, Xin
[J]. KNOWLEDGE-BASED SYSTEMS, 2020, 192
[7] Fast Changeset-based Bug Localization with BERT
Ciborowska, Agnieszka
Damevski, Kostadin
[J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 946 - 957
[8] Dig D., 2006, COMP 21 ACM SIGPLAN, P675
[9] Effective software merging in the presence of object-oriented refactorings
Dig, Danny
Manzoor, Kashif
Johnson, Ralph
Nguyen, Tien N.
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2008, 34 (03) : 321 - 335
[10] Discovering Repetitive Code Changes in Python']Python ML Systems
Dilhara, Malinda
Ketkar, Ameya
Sannidhi, Nikhith
Dig, Danny
[J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 736 - 748

← 1 2 3 4 5 6 →