SynShine: Improved Fixing of Syntax Errors

被引:11
作者
Ahmed, Toufique [1 ]
Ledesma, Noah Rose [1 ]
Devanbu, Premkumar [1 ]
机构
[1] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
基金
美国国家科学基金会;
关键词
Deep learning; program repair; naturalness; GENERATION;
D O I
10.1109/TSE.2022.3212635
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Novice programmers struggle with the complex syntax of modern programming languages like Java, and make lot of syntax errors. The diagnostic syntax error messages from compilers and IDEs are sometimes useful, but often the messages are cryptic and puzzling. Novices could be helped, and instructors' time saved, by automated repair suggestions when dealing with syntax errors. Large samples of novice errors and fixes are now available, offering the possibility of data-driven machine-learning approaches to help novices fix syntax errors. Current machine-learning approaches do a reasonable job fixing syntax errors in shorter programs, but don't work as well even for moderately longer programs. We introduce SYNSHINE, a machine-learning based tool that substantially improves on the state-of-the-art, by learning to use compiler diagnostics, employing a very large neural model that leverages unsupervised pre-training, and relying on multi-label classification rather than autoregressive synthesis to generate the (repaired) output. We describe SYNSHINE's architecture in detail, and provide a detailed evaluation. We have built SYNSHINE into a free, open-source version of Visual Studio Code (VSCode); we make all our source code and models freely available.
引用
收藏
页码:2169 / 2181
页数:13
相关论文
共 51 条
  • [1] Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655
  • [2] Learning to Find Usages of Library Functions in Optimized Binaries
    Ahmed, Toufique
    Devanbu, Premkumar
    Sawant, Anand Ashok
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (10) : 3862 - 3876
  • [3] Learning lenient parsing & typing via indirect supervision
    Ahmed, Toufique
    Devanbu, Premkumar
    Hellendoorn, Vincent J.
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2021, 26 (02)
  • [4] Compilation Error Repair: For the Student Programs, From the Student Programs
    Ahmed, Umair Z.
    Kumar, Pawan
    Karkare, Amey
    Kar, Purushottam
    Gulwani, Sumit
    [J]. 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING EDUCATION AND TRAINING (ICSE-SEET), 2018, : 78 - 87
  • [5] Do Developers Read Compiler Error Messages?
    Barik, Titus
    Smith, Justin
    Lubick, Kevin
    Holmes, Elisabeth
    Feng, Jing
    Murphy-Hill, Emerson
    Parnin, Chris
    [J]. 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2017, : 575 - 585
  • [6] Compiler Error Messages Considered Unhelpful: The Landscape of Text-Based Programming Error Message Research
    Becker, Brett A.
    Denny, Paul
    Pettit, Raymond
    Bouchard, Durell
    Bouvier, Dennis J.
    Harrington, Brian
    Kamil, Amir
    Karkare, Amey
    McDonald, Chris
    Osera, Peter-Michael
    Pearce, Janice L.
    Prather, James
    [J]. PROCEEDINGS OF THE WORKING GROUP REPORTS ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION (ITICSE-WGR '19), 2019, : 177 - 210
  • [7] Bennedsen J., 2007, SIGCSE Bulletin, V39, P32, DOI 10.1145/1272848.1272879
  • [8] Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT
    Biswas, Eeshita
    Karabulut, Mehmet Efruz
    Pollock, Lori
    Vijay-Shanker, K.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020), 2020, : 162 - 173
  • [9] Novice Java']Java Programming Mistakes: Large-Scale Data vs. Educator Beliefs
    Brown, Neil C. C.
    Altadmri, Amjad
    [J]. ACM TRANSACTIONS ON COMPUTING EDUCATION, 2017, 17 (02):
  • [10] Blackbox: A Large Scale Repository of Novice Programmers' Activity
    Brown, Neil C. C.
    Kolling, Michael
    McCall, Davin
    Utting, Ian
    [J]. PROCEEDINGS OF THE 45TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION (SIGCSE'14), 2014, : 223 - 228