Multilingual code refactoring detection based on deep learning

被引:0
作者
Li, Tao [1 ]
Zhang, Yang [1 ]
机构
[1] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Peoples R China
关键词
Refactoring detection; Deep learning; Code change; Multilingual code; Edit sequence;
D O I
10.1016/j.eswa.2024.125164
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Refactoring is a critical process of improving the internal structure of the source code without altering its external behavior. Existing deep learning-based refactoring detection relies on commit messages to extract features. However, these commit messages are not trustful enough since some developers do not consistently record refactoring activities. Furthermore, current approaches are designed for a single programming language and lack multilingual refactoring support. To this end, this paper proposes RefT5, , a multilingual code refactoring detection approach based on deep learning. Firstly, we select 110 real-world projects with Java and Python programming languages as a corpus to construct the dataset. Secondly, we extract features including commit messages, code changes, and refactoring types from these projects. RefT5 generates edit sequences from code changes and takes refactoring types as labels. Thirdly, we employ CodeT5 and BiLSTM-attention to extract semantic and structural features and generate feature vectors. Finally, the feature vectors are input into a classification layer to detect the refactoring type. The experimental results show that RefT5 obtains 98.05% precision and 97.77% recall. Furthermore, compared with existing approaches, it improves precision by 51.61% and recall by 52.9% on average, demonstrating its effectiveness.
引用
收藏
页数:10
相关论文
共 56 条
  • [1] Can Refactoring be Self-Affirmed? An Exploratory Study on How Developers Document their Refactoring Activities in Commit Messages
    AlOmar, Eman Abdullah
    Mkaouer, Mohamed Wiem
    Ouni, Ali
    [J]. 2019 IEEE/ACM 3RD INTERNATIONAL WORKSHOP ON REFACTORING (IWOR 2019), 2019, : 51 - 58
  • [2] The Effectiveness of Supervised Machine Learning Algorithms in Predicting Software Refactoring
    Aniche, Mauricio
    Maziero, Erick
    Durelli, Rafael
    Durelli, Vinicius H. S.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1432 - 1450
  • [3] SATT: Tailoring Code Metric Thresholds for Different Software Architectures
    Aniche, Mauricio
    Treude, Christoph
    Zaidman, Andy
    van Deursen, Arie
    Gerosa, Marco Aurelio
    [J]. 2016 IEEE 16TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2016, : 41 - 50
  • [4] PYREF: Refactoring Detection in Python']Python Projects
    Atwi, Hassan
    Lin, Bin
    Tsantalis, Nikolaos
    Kashiwa, Yutaro
    Kamei, Yasutaka
    Ubayashi, Naoyasu
    Bavota, Gabriele
    Lanza, Michele
    [J]. IEEE 21ST INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2021), 2021, : 136 - 141
  • [5] Charbuty B., 2021, J. Appl. Sci. Technol. Trends, V2, P20, DOI DOI 10.38094/JASTT20165
  • [6] A novel selective naive Bayes algorithm
    Chen, Shenglei
    Webb, Geoffrey I.
    Liu, Linyuan
    Ma, Xin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 192
  • [7] Fast Changeset-based Bug Localization with BERT
    Ciborowska, Agnieszka
    Damevski, Kostadin
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 946 - 957
  • [8] Dig D., 2006, COMP 21 ACM SIGPLAN, P675
  • [9] Effective software merging in the presence of object-oriented refactorings
    Dig, Danny
    Manzoor, Kashif
    Johnson, Ralph
    Nguyen, Tien N.
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2008, 34 (03) : 321 - 335
  • [10] Discovering Repetitive Code Changes in Python']Python ML Systems
    Dilhara, Malinda
    Ketkar, Ameya
    Sannidhi, Nikhith
    Dig, Danny
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 736 - 748