Is the Ground Truth Really Accurate? Dataset Purification for Automated Program Repair

被引：7

作者：

Yang, Deheng ^{[1
]}

Lei, Yan ^{[2
]}

Mao, Xiaoguang ^{[1
]}

Lo, David ^{[3
]}

Xie, Huan ^{[2
]}

Yan, Meng ^{[2
]}

机构：

[1] Natl Univ Def Technol, Changsha, Peoples R China

[2] Chongqing Univ, Chongqing, Peoples R China

[3] Singapore Management Univ, Singapore, Singapore

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021) | 2021年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

bug dataset; automated program repair; dataset purification; CODE; GENERATION;

D O I：

10.1109/SANER50967.2021.00018

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Datasets of real-world bugs shipped with human-written patches are intensively used in the evaluation of existing automated program repair (APR) techniques, wherein the human-written patches always serve as the ground truth, for manual or automated assessment approaches, to evaluate the correctness of test-suite adequate patches. An inaccurate human-written patch tangled with other code changes will pose threats to the reliability of the assessment results. Therefore, the construction of such datasets always requires much manual effort on isolating real bug fixes from bug fixing commits. However, the manual work is time-consuming and prone to mistakes, and little has been known on whether the ground truth in such datasets is really accurate. In this paper, we propose DEPTEST, an automated DatasEt Purification technique from the perspective of triggering Tests. Leveraging coverage analysis and delta debugging, DEPTEST can automatically identify and filter out the code changes irrelevant to the bug exposed by triggering tests. To measure the strength of DEPTEST, we run it on the most extensively used dataset (i.e., Defects4J) that claims to already exclude all irrelevant code changes for each bug fix via manual purification. Our experiment indicates that even in a dataset where the bug fix is claimed to be well isolated, 41.01% of human-written patches can be further reduced by 4.3 lines on average, with the largest reduction reaching up to 53 lines. This indicates its great potential in assisting in the construction of datasets of accurate bug fixes. Furthermore, based on the purified patches, we re-dissect Defects4J and systematically revisit the APR of multi-chunk bugs to provide insights for future research targeting such bugs.

引用

页码：96 / 107

页数：12

共 50 条

[41] Automated Large Program Repair based on Big Code
Hoang Van Thuy
Phan Viet Anh
Nguyen Xuan Hoai
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 375 - 381
[42] Evaluating the Strategies of Statement Selection in Automated Program Repair
Yang, Deheng
Qi, Yuhua
Mao, Xiaoguang
SOFTWARE ANALYSIS, TESTING, AND EVOLUTION, SATE 2018, 2018, 11293 : 33 - 48
[43] An Investigation into the Use of Mutation Analysis for Automated Program Repair
Timperley, Christopher Steven
Stepney, Susan
Le Goues, Claire
SEARCH BASED SOFTWARE ENGINEERING, SSBSE 2017, 2017, 10452 : 99 - 114
[44] Overfitting in Semantics-based Automated Program Repair
Le, Xuan-Bach D.
Thung, Ferdian
Lo, David
Le Goues, Claire
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 163 - 163
[45] Should Fixing These Failures be Delegated to Automated Program Repair?
Le, Xuan-Bach D.
Le, Tien-Duy B.
Lo, David
2015 IEEE 26TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2015, : 427 - 437
[46] Seeing the Whole Elephant: Systematically Understanding and Uncovering Evaluation Biases in Automated Program Repair
Yang, Deheng
Lei, Yan
Mao, Xiaoguang
Qi, Yuhua
Yi, Xin
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (03)
[47] Reinforcement learning for mutation operator selection in automated program repair
Hanna, Carol
Blot, Aymeric
Petke, Justyna
AUTOMATED SOFTWARE ENGINEERING, 2025, 32 (02)
[48] Leveraging Syntax-Related Code for Automated Program Repair
Xin, Qi
Reiss, Steven P.
PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17), 2017, : 660 - 670
[49] An Empirical Study on the Usage of Fault Localization in Automated Program Repair
Yang, Deheng
Qi, Yuhua
Mao, Xiaoguang
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 504 - 508
[50] Evolving Paradigms in Automated Program Repair: Taxonomy, Challenges, and Opportunities
Huang, Kai
Xu, Zhengzi
Yang, Su
Sun, Honyu
Li, Xuejun
Yan, Zheng
Zhang, Yuqing
ACM COMPUTING SURVEYS, 2025, 57 (02)

← 1 2 3 4 5 →