Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs

被引:0
作者
Ouyang, Yicheng [1 ]
Yang, Jun [1 ]
Zhang, Lingming [1 ]
机构
[1] Univ Illinois, Champaign, IL 61820 USA
来源
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024 | 2024年
关键词
Program repair; Mutation testing; Empirical assessment;
D O I
10.1145/3650212.3652140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As bugs are inevitable and prevalent in real-world programs, many Automated Program Repair (APR) techniques have been proposed to generate patches for them. However, due to the lack of a standard for evaluating APR techniques, prior works tend to use different settings and benchmarks in evaluation, threatening the trustworthiness of the evaluation results. Additionally, they typically only adopt plausibility and genuineness as evaluation metrics, which may potentially mask some underlying issues in APR techniques. To overcome these issues, in this paper, we conduct an extensive and multi-dimensional evaluation of nine learning-based and three traditional state-of-the-art APR techniques under the same environment and settings. We employ the widely studied Defects4J V2.0.0 benchmark and a newly constructed large-scale mutationbased benchmark named MuBench, derived from Defects4J and including 1,700 artificial bugs generated by various mutators, to uncover potential limitations in these APR techniques. We also apply multi-dimensional metrics, including compilability/plausibility/genuineness metrics, as well as SYE (SYntactic Equivalence) and TCE (Trivial Compiler Equivalence) metrics, to thoroughly analyze the 1,814,652 generated patches. This paper presents noteworthy findings from the extensive evaluation: Firstly, Large Language Model (LLM) based APR demonstrates less susceptibility to overfitting on the Defects4J V1.2.0 dataset and fixes the most number of bugs. Secondly, the study suggests a promising future for combining traditional and learning based APR techniques, as they exhibit complementary advantages in fixing different types of bugs. Additionally, this work highlights the necessity for further enhancing patch compilability of learning based APR techniques, despite the presence of various existing strategies attempting to improve it. The study also reveals other guidelines for enhancing APR techniques, including the need for handling unresolvable symbol compilability issues and reducing duplicate/no-op patch generation. Finally, our study uncovers seven implementation issues in the studied techniques, with five of them confirmed and fixed by the corresponding authors.
引用
收藏
页码:440 / 452
页数:13
相关论文
共 66 条
[1]  
ANTLR, 2023, ABOUT US
[2]  
ASM, 2023, ABOUT US
[3]   The Care and Feeding of Wild-Caught Mutants [J].
Brown, David Bingham ;
Vaughn, Michael ;
Liblit, Ben ;
Reps, Thomas .
ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, :511-522
[4]   CODIT: Code Editing With Tree-Based Neural Models [J].
Chakraborty, Saikat ;
Ding, Yangruibo ;
Allamanis, Miltiadis ;
Ray, Baishakhi .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) :1385-1399
[5]   Fast and Precise On-the-fly Patch Validation for All [J].
Chen, Lingchao ;
Ouyang, Yicheng ;
Zhang, Lingming .
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, :1123-1134
[6]  
Chen LS, 2017, IEEE INT CONF AUTOM, P637, DOI 10.1109/ASE.2017.8115674
[7]   SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair [J].
Chen, Zimin ;
Kommrusch, Steve ;
Tufano, Michele ;
Pouchet, Louis-Noel ;
Poshyvanyk, Denys ;
Monperrus, Martin .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) :1943-1959
[8]  
Coles H., 2016, P 25 INT S SOFTW TES, P449, DOI [10.1145/2931037.2948707, DOI 10.1145/2931037.2948707]
[9]   Generating Fixes from Object Behavior Anomalies [J].
Dallmeier, Valentin ;
Zeller, Andreas ;
Meyer, Bertrand .
2009 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :550-554
[10]   Patching as Translation: the Data and the Metaphor [J].
Ding, Yangruibo ;
Ray, Baishakhi ;
Devanbu, Premkumar ;
Hellendoorn, Vincent J. .
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, :275-286