An Empirical Investigation into Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

被引:91
作者
Tufano, Michele [1 ]
Watson, Cody [1 ]
Bavota, Gabriele [2 ]
Di Penta, Massimiliano [3 ]
White, Martin [1 ]
Poshyvanyk, Denys [1 ]
机构
[1] Coll William & Mary, Williamsburg, VA 23185 USA
[2] Univ Svizzera Italiana USI, Lugano, Switzerland
[3] Univ Sannio, Benevento, Italy
来源
PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18) | 2018年
关键词
neural machine translation; bug-fixes; COMMIT;
D O I
10.1145/3238147.3240732
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. We mine millions of bug-fixes from the change histories of GitHub repositories to extract meaningful examples of such bug-fixes. Then, we abstract the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version. Our model is able to fix hundreds of unique buggy methods in the wild. Overall, this model is capable of predicting fixed patches generated by developers in 9% of the cases.
引用
收藏
页码:832 / 837
页数:6
相关论文
共 39 条
  • [21] The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs
    Le Goues, Claire
    Holtschulte, Neal
    Smith, Edward K.
    Brun, Yuriy
    Devanbu, Premkumar
    Forrest, Stephanie
    Weimer, Westley
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2015, 41 (12) : 1236 - 1256
  • [22] Le Goues C, 2012, PROC INT CONF SOFTW, P3, DOI 10.1109/ICSE.2012.6227211
  • [23] History Driven Program Repair
    Le, Xuan-Bach D.
    Lo, David
    Le Goues, Claire
    [J]. 2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 1, 2016, : 213 - 224
  • [24] Automatic Patch Generation by Learning Correct Code
    Long, Fan
    Rinard, Martin
    [J]. ACM SIGPLAN NOTICES, 2016, 51 (01) : 298 - 312
  • [25] Luong T., 2015, Effective approaches to attentionbased neural machine translation, P1412
  • [26] Automatic repair of real bugs in java']java: a large-scale experiment on the defects4j dataset
    Martinez, Matias
    Durieux, Thomas
    Sommerard, Romain
    Xuan, Jifeng
    Monperrus, Martin
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2017, 22 (04) : 1936 - 1964
  • [27] Parr Terence, 2013, The definitive ANTLR 4 reference, V2nd
  • [28] Raychev V, 2014, ACM SIGPLAN NOTICES, V49, P419, DOI [10.1145/2666356.2594321, 10.1145/2594291.2594321]
  • [29] Robert C.seacord., 2003, MODERNIZING LEGACY S
  • [30] Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair
    Smith, Edward K.
    Barr, Earl T.
    Le Goues, Claire
    Brun, Yuriy
    [J]. 2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 532 - 543