ReAPR: Automatic program repair via retrieval-augmented large language models

被引:0
作者
Liu, Zixin [1 ]
Du, Xiaozhi [2 ]
Liu, Hairui [1 ]
机构
[1] Xian Jiaotong Univ China, Sch Elect Engn, 28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China
[2] Xi An Jiao Tong Univ, Sch Software Engn, Shaanxi Joint Key Lab Artificial Intelligence, 28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China
关键词
Automated Program Repair; Retrieval-Augmented Generation; Large Language Models; Prompt Learning;
D O I
10.1007/s11219-025-09728-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic Program Repair (APR) aims to automatically fix software defects, significantly reducing the efforts of manual debugging. Recent studies have demonstrated impressive results in utilizing Large Language Models (LLMs) for software bug fixing. Current LLM-based approaches depend solely on the pre-trained knowledge of LLMs, overlooking the prior knowledge contained in historical bug repair records, which increases the likelihood of hallucinations. To address this challenge, this paper proposes ReAPR, a retrieval-augmented framework for APR. We first curate a high-quality retrieval database by carefully compiling and filtering the existing datasets for APR. Subsequently, ReAPR leverages a retriever to fetch bug-fix pairs similar to the target bug from a retrieval database, providing contextual hints to guide the LLMs in the repair process. We then investigate two techniques to retrieve bug-fix pairs associated with the function to be fixed: BM25 and Dense Passage Retrieval (DPR). After retrieving the relevant bug-fix pair, we construct a prompt and integrate the retrieved pair into it. Besides, we also compare the proposed RAG-based approach with the parameter-efficient fine-tuning (PEFT) approaches on repair performance. To validate the effectiveness of ReAPR, we conduct extensive experiments based on the widely-used benchmark dataset Defects4j 2.0 as well as the latest benchmark GitBug-Java. The results show that ReAPR, based on the CodeLlama(7B) backbone, successfully fixes 68 and 59 bugs in the DPR and BM25 settings, respectively, in Defects4j 2.0, outperforming the best baseline approach by 18 and 9 bugs under the same repair settings.
引用
收藏
页数:31
相关论文
共 80 条
[1]  
Ahmad WU, 2021, Arxiv, DOI [arXiv:2103.06333, 10.48550/arXiv.2103.06333]
[2]  
Hayati SA, 2018, Arxiv, DOI arXiv:1808.10025
[3]  
Arwan A, 2015, 2015 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), P295, DOI 10.1109/ICoICT.2015.7231439
[4]  
Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165]
[5]  
Black S., 2021, metadata, V58
[6]   When Deep Learning Met Code Search [J].
Cambronero, Jose ;
Li, Hongyu ;
Kim, Seohyun ;
Sen, Koushik ;
Chandra, Satish .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :964-974
[7]  
Chen LS, 2017, IEEE INT CONF AUTOM, P637, DOI 10.1109/ASE.2017.8115674
[8]  
Chen M., 2021, Evaluating large language models trained on code, DOI DOI 10.48550/ARXIV.2107.03374
[9]  
Chen ZM, 2018, Arxiv, DOI arXiv:1807.03200
[10]  
Clark K, 2020, Arxiv, DOI [arXiv:2003.10555, DOI 10.48550/ARXIV.2003.10555]