FlexFL: Flexible and Effective Fault Localization With Open-Source Large Language Models

被引:0
作者
Xu, Chuyang [1 ]
Liu, Zhongxin [1 ,2 ]
Ren, Xiaoxue [1 ,2 ]
Zhang, Gehao [3 ]
Liang, Ming
Lo, David [4 ]
机构
[1] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou 310027, Peoples R China
[2] Hangzhou High Tech Zone Binjiang Inst Blockchain &, Hangzhou 310052, Peoples R China
[3] Ant Grp, Hangzhou 310013, Peoples R China
[4] Singapore Management Univ, Sch Comp & Informat Syst, Singapore 188065, Singapore
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
Computer bugs; Location awareness; Codes; Debugging; Pipelines; Large language models; Training; Data privacy; Source coding; Software systems; Fault localization; large language model; LLM-based agent; BUG LOCALIZATION;
D O I
10.1109/TSE.2025.3553363
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fault localization (FL) targets identifying bug locations within a software system, which can enhance debugging efficiency and improve software quality. Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5. In addition, FlexFL can localize 93 bugs that cannot be localized by non-LLM-based FL techniques at the top 1. Furthermore, to mitigate potential data contamination, we conduct experiments on a dataset which Llama3-8B has not seen before, and the evaluation results show that FlexFL can also achieve good performance.
引用
收藏
页码:1455 / 1471
页数:17
相关论文
共 65 条
[1]   On the accuracy of spectrum-based fault localization [J].
Abreu, Rui ;
Zoeteweij, Peter ;
van Gemund, Arjan J. C. .
TAIC PART 2007 - TESTING: ACADEMIC AND INDUSTRIAL CONFERENCE - PRACTICE AND RESEARCH TECHNIQUES, PROCEEDINGS: CO-LOCATED WITH MUTATION 2007, 2007, :89-+
[2]  
2023, Arxiv, DOI [arXiv:2303.08774, 10.48550/arXiv.2303.08774., DOI 10.48550/ARXIV.2303.08774]
[3]  
[Anonymous], 2024, Blog of Meta Llama 3
[4]  
[Anonymous], 2024, Cutoff date of training dataset of Llama3
[5]  
[Anonymous], 2024, Blog of Mistral-Nemo
[6]  
[Anonymous], 2024, Our replication package
[7]  
[Anonymous], 2024, Blog of Qwen2
[8]  
[Anonymous], 2024, Model Card of Llama3-8B-Instruct
[9]  
[Anonymous], 2010, Bug report of Time-25(Defects4J)
[10]  
[Anonymous], 2024, Open LLM Leaderboard of HuggingFace