Is the Ground Truth Really Accurate? Dataset Purification for Automated Program Repair

被引:7
作者
Yang, Deheng [1 ]
Lei, Yan [2 ]
Mao, Xiaoguang [1 ]
Lo, David [3 ]
Xie, Huan [2 ]
Yan, Meng [2 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Chongqing Univ, Chongqing, Peoples R China
[3] Singapore Management Univ, Singapore, Singapore
来源
2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021) | 2021年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
bug dataset; automated program repair; dataset purification; CODE; GENERATION;
D O I
10.1109/SANER50967.2021.00018
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Datasets of real-world bugs shipped with human-written patches are intensively used in the evaluation of existing automated program repair (APR) techniques, wherein the human-written patches always serve as the ground truth, for manual or automated assessment approaches, to evaluate the correctness of test-suite adequate patches. An inaccurate human-written patch tangled with other code changes will pose threats to the reliability of the assessment results. Therefore, the construction of such datasets always requires much manual effort on isolating real bug fixes from bug fixing commits. However, the manual work is time-consuming and prone to mistakes, and little has been known on whether the ground truth in such datasets is really accurate. In this paper, we propose DEPTEST, an automated DatasEt Purification technique from the perspective of triggering Tests. Leveraging coverage analysis and delta debugging, DEPTEST can automatically identify and filter out the code changes irrelevant to the bug exposed by triggering tests. To measure the strength of DEPTEST, we run it on the most extensively used dataset (i.e., Defects4J) that claims to already exclude all irrelevant code changes for each bug fix via manual purification. Our experiment indicates that even in a dataset where the bug fix is claimed to be well isolated, 41.01% of human-written patches can be further reduced by 4.3 lines on average, with the largest reduction reaching up to 53 lines. This indicates its great potential in assisting in the construction of datasets of accurate bug fixes. Furthermore, based on the purified patches, we re-dissect Defects4J and systematically revisit the APR of multi-chunk bugs to provide insights for future research targeting such bugs.
引用
收藏
页码:96 / 107
页数:12
相关论文
共 50 条
  • [31] Automated program repair: a step towards software automation
    Abhik ROYCHOUDHURY
    Yingfei XIONG
    ScienceChina(InformationSciences), 2019, 62 (10) : 47 - 49
  • [32] Overfitting in semantics-based automated program repair
    Le, Xuan Bach D.
    Thung, Ferdian
    Lo, David
    Le Goues, Claire
    EMPIRICAL SOFTWARE ENGINEERING, 2018, 23 (05) : 3007 - 3033
  • [33] Impact of Code Language Models on Automated Program Repair
    Jiang, Nan
    Liu, Kevin
    Lutellier, Thibaud
    Tan, Lin
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 1430 - 1442
  • [34] A Survey of Learning-based Automated Program Repair
    Zhang, Quanjun
    Fang, Chunrong
    Ma, Yuxiang
    Sun, Weisong
    Chen, Zhenyu
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (02)
  • [35] Automated Clustering and Program Repair for Introductory Programming Assignments
    Gulwani, Sumit
    Radicek, Ivan
    Zuleger, Florian
    PROCEEDINGS OF THE 39TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, PLDI 2018, 2018, : 465 - 480
  • [36] Be Realistic: Automated Program Repair is a Combination of Undecidable Problems
    Nilizadeh, Amirfarhad
    Leavens, Gary T.
    INTERNATIONAL WORKSHOP ON AUTOMATED PROGRAM REPAIR (APR 2022), 2022, : 31 - 32
  • [37] Automatic Software Merging using Automated Program Repair
    Xing, Xiaoqian
    Maruyama, Katsuhisa
    2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF '19), 2019, : 11 - 16
  • [38] ThinkRepair: Self-Directed Automated Program Repair
    Yin, Xin
    Ni, Chao
    Wang, Shaohua
    Li, Zhenhao
    Zeng, Limin
    Yang, Xiaohu
    PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 1274 - 1286
  • [39] Overfitting in semantics-based automated program repair
    Xuan Bach D. Le
    Ferdian Thung
    David Lo
    Claire Le Goues
    Empirical Software Engineering, 2018, 23 : 3007 - 3033
  • [40] Applying Automated Program Repair to Dataflow Programming Languages
    Huang, Yu
    Ahmad, Hammad
    Forrest, Stephanie
    Weimer, Westley
    2021 IEEE/ACM INTERNATIONAL WORKSHOP ON GENETIC IMPROVEMENT (GI 2021), 2021, : 21 - 22