Understanding the Robustness of Transformer-Based Code Intelligence via Code Transformation: Challenges and Opportunities

被引：0

作者：

Li, Yaoxian ^{[1
]}

Qi, Shiyi ^{[1
]}

Gao, Cuiyun ^{[1
]}

Peng, Yun ^{[2
]}

Lo, David ^{[3
]}

Lyu, Michael R. ^{[2
]}

Xu, Zenglin ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

[2] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong 999077, Peoples R China

[3] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2025年 / 51卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Code intelligence; code transformation; transformer; Robustness; robustness; MODELS;

D O I：

10.1109/TSE.2024.3524461

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Transformer-based models have demonstrated state-of-the-art performance in various intelligent coding tasks such as code comment generation and code completion. Previous studies show that deep learning models are sensitive to input variations, but few have systematically studied the robustness of Transformer under perturbed input code. In this work, we empirically study the effect of semantic-preserving code transformations on the performance of Transformers. Specifically, 27 and 24 code transformation strategies are implemented for two popular programming languages, Java and Python, respectively. To facilitating analysis, the strategies are grouped into five categories: block transformation, insertion / deletion transformation, grammatical statement transformation, grammatical token transformation, and identifier transformation. Experiments on three popular code intelligence tasks, including code completion, code summarization, and code search, demonstrate that insertion / deletion transformation and identifier transformation have the greatest impact on the performance of Transformers. Our results also suggest that Transformers based on abstract syntax trees (ASTs) show more robust performance than models based only on code sequences under most code transformations. Besides, the design of positional encoding can impact the robustness of Transformers under code transformations. We also investigate substantial code transformations at the strategy level to expand our study and explore other factors influencing the robustness of Transformers. Furthermore, we explore applications of code transformations. Based on our findings, we distill insights about the challenges and opportunities for Transformer-based code intelligence from various perspectives.

引用

页码：521 / 547

页数：27

共 93 条

[1]

Ahmad Wasi Uddin, 2020, P 58 ANN M ASS COMPU, P4998, DOI [10.18653/v1/2020.acl-main.449, DOI 10.18653/V1/2020.ACL-MAIN.449]

[2] A Survey of Machine Learning for Big Code and Naturalness [J].

Allamanis, Miltiadis ;

Barr, Earl T. ;

Devanbu, Premkumar ;

Sutton, Charles .

ACM COMPUTING SURVEYS, 2018, 51 (04)

[3]

Alon U, 2020, PR MACH LEARN RES, V119

[4]

Alon Uri, 2019, P INT C LEARN REPR

[5]

Ba J.L., 2016, arXiv

[6]

Banerjee Satanjeev, 2005, P ACL WORKSHOP INTRI

[7]

Baresel A., 2004, Software Engineering Notes, V29, P108, DOI 10.1145/1013886.1007527

[8]

Berabi B, 2021, PR MACH LEARN RES, V139

[9]

Bielik P, 2020, PR MACH LEARN RES, V119

[10] FlagRemover: A Testability Transformation for Transforming Loop-Assigned Flags [J].

Binkley, David W. ;

Harman, Mark ;

Lakhotia, Kiran .

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2011, 20 (03)

← 1 2 3 4 5 6 7 8 9 10 →