Benchmarking and Categorizing the Performance of Neural Program Repair Systems for Java']Java

被引：0

作者：

Zhong, Wenkang ^{[1
]}

Li, Chuanyi ^{[1
]}

Liu, Kui ^{[2
]}

Ge, Jidong ^{[1
]}

Luo, Bin ^{[1
]}

Bissyande, TEGAWENDe F. ^{[3
]}

Ng, Vincent ^{[4
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software & Technol, Nanjing, Peoples R China

[2] Huawei Software Engn Applicat Technol Lab, Hangzhou, Peoples R China

[3] Univ Luxembourg, Luxembourg, Luxembourg

[4] Univ Texas Dallas, Richardson, TX USA

来源：

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY | 2024年 / 34卷 / 01期

基金：

欧洲研究理事会; 中国国家自然科学基金;

关键词：

datasets; program repair; benchmark; empirical study;

D O I：

10.1145/3688834

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Recent years have seen a rise in Neural Program Repair (NPR) systems in the software engineering community, which adopt advanced deep learning techniques to automatically fix bugs. Having a comprehensive understanding of existing systems can facilitate new improvements in this area and provide practical instructions for users. However, we observe two potential weaknesses in the current evaluation of NPR systems: (1) published systems are trained with varying data, and (2) NPR systems are roughly evaluated through the number of totally fixed bugs. Questions such as what types of bugs are repairable for current systems cannot be answered yet. Consequently, researchers cannot make target improvements in this area and users have no idea of the real affair of existing systems. In this article, we perform a systematic evaluation of the existing nine state-of-the-art NPR systems. To perform a fair and detailed comparison, we (1) build a new benchmark and framework that supports training and validating the nine systems with unified data and (2) evaluate re-trained systems with detailed performance analysis, especially on the effectiveness and the efficiency. We believe our benchmark tool and evaluation results could offer practitioners the real affairs of current NPR systems and the implications of further facilitating the improvements of NPR.

引用

页数：35

共 26 条

[21] Benchmarking Automated Program Repair: An Extensive Study on Both Real-World and Artificial Bugs
Ouyang, Yicheng
Yang, Jun
Zhang, Lingming
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 440 - 452
[22] Benchmarking Human Performance on the Acoustic and Linguistic Subtasks of ASR Systems
Laszlo Toth
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 349 - 352
[23] Cluster systems and simulation: from benchmarking to off-line performance prediction
Di Martino, Beniamino
Mancini, Emilio
Rak, Massimiliano
Torella, Roberto
Villano, Umberto
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (11) : 1549 - 1562
[24] On the Effectiveness of Unified Debugging: An Extensive Study on 16 Program Repair Systems
Benton, Samuel
Li, Xia
Lou, Yiling
Zhang, Lingming
2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 907 - 918
[25] Applications of performance benchmarking to the development of signal processing systems based on personal computer technology
Inkol, Robert
Wilson, Collin
Eidus, Mathieu
2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2449 - +
[26] Estimating the Performance of Cloud-Based Systems Using Benchmarking and Simulation in a Complementary Manner
Johng, Haan
Kim, Doohwan
Hill, Tom
Chung, Lawrence
SERVICE-ORIENTED COMPUTING (ICSOC 2018), 2018, 11236 : 576 - 591

← 1 2 3 →