Automated Program Repair in the Era of Large Pre-trained Language Models

被引：106

作者：

Xia, Chunqiu Steven ^{[1
]}

Wei, Yuxiang ^{[1
]}

Zhang, Lingming ^{[1
]}

机构：

[1] Univ Illinois, Champaign, IL 61820 USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年

关键词：

CODE;

D O I：

10.1109/ICSE48619.2023.00129

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates (traditional) or directly predict potential patches (learning-based). Large Pre-Trained Language Models (LLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged LLMs for APR without relying on any bugfixing datasets. Meanwhile, such existing work either failed to include state-of-the-art LLMs or was not evaluated on realistic datasets. Thus, the true power of modern LLMs on the important APR problem is yet to be revealed. In this work, we perform the first extensive study on directly applying LLMs for APR. We select 9 recent state-of-the-art LLMs, including both generative and infilling models, ranging from 125M to 20B in size. We designed 3 different repair settings to evaluate the different ways we can use LLMs to generate patches: 1) generate the entire patch function, 2) fill in a chunk of code given the prefix and suffix 3) output a single line fix. We apply the LLMs under these repair settings on 5 datasets across 3 different languages and compare different LLMs in the number of bugs fixed, generation speed and compilation rate. We also compare the LLMs against recent state-of-the-art APR tools. Our study demonstrates that directly applying state-ofthe-art LLMs can already substantially outperform all existing APR techniques on all our datasets. Among the studied LLMs, the scaling effect exists for APR where larger models tend to achieve better performance. Also, we show for the first time that suffix code after the buggy line (adopted in infilling-style APR) is important in not only generating more fixes but more patches with higher compilation rate. Besides patch generation, the LLMs consider correct patches to be more natural than other ones, and can even be leveraged for effective patch ranking or patch correctness checking. Lastly, we show that LLM-based APR can be further substantially boosted via: 1) increasing the sample size, and 2) incorporating fix template information.

引用

页码：1482 / 1494

页数：13

共 50 条

[31] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
Xu, Weiwen
Li, Xin
Zhang, Wenxuan
Zhou, Meng
Lam, Wai
Si, Luo
Bing, Lidong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[32] Pre-trained models for natural language processing: A survey
Qiu XiPeng
Sun TianXiang
Xu YiGe
Shao YunFan
Dai Ning
Huang XuanJing
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
[33] Analyzing Individual Neurons in Pre-trained Language Models
Durrani, Nadir
Sajjad, Hassan
Dalvi, Fahim
Belinkov, Yonatan
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4865 - 4880
[34] Probing Pre-Trained Language Models for Disease Knowledge
Alghanmi, Israa
Espinosa-Anke, Luis
Schockaert, Steven
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3023 - 3033
[35] Emotional Paraphrasing Using Pre-trained Language Models
Casas, Jacky
Torche, Samuel
Daher, Karl
Mugellini, Elena
Abou Khaled, Omar
2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
[36] Impact of Morphological Segmentation on Pre-trained Language Models
Westhelle, Matheus
Bencke, Luciana
Moreira, Viviane P.
INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 402 - 416
[37] Dynamic Knowledge Distillation for Pre-trained Language Models
Li, Lei
Lin, Yankai
Ren, Shuhuai
Li, Peng
Zhou, Jie
Sun, Xu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
[38] Prompt Tuning for Discriminative Pre-trained Language Models
Yao, Yuan
Dong, Bowen
Zhang, Ao
Zhang, Zhengyan
Xie, Ruobing
Liu, Zhiyuan
Lin, Leyu
Sun, Maosong
Wang, Jianyong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3468 - 3473
[39] A Close Look into the Calibration of Pre-trained Language Models
Chen, Yangyi
Yuan, Lifan
Cui, Ganqu
Liu, Zhiyuan
Ji, Heng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1343 - 1367
[40] Deep Entity Matching with Pre-Trained Language Models
Li, Yuliang
Li, Jinfeng
Suhara, Yoshihiko
Doan, AnHai
Tan, Wang-Chiew
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60

← 1 2 3 4 5 →