Jigsaw: Large Language Models meet Program Synthesis

被引:68
作者
Jain, Naman [1 ]
Vaidyanath, Skanda [1 ,2 ]
Iyer, Arun [1 ]
Natarajan, Nagarajan [1 ]
Parthasarathy, Suresh [1 ]
Rajamani, Sriram [1 ]
Sharma, Rahul [1 ]
机构
[1] Microsoft Res, Bangalore, Karnataka, India
[2] Stanford Univ, Stanford, CA 94305 USA
来源
2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022) | 2022年
关键词
D O I
10.1145/3510003.3510203
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Large pre-trained language models such as GPT-3 [10], Codex [11], and Google's language model [7] are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every programmer in the world. On the cautionary side, since these large language models do not understand program semantics, they offer no guarantees about quality of the suggested code. In this paper, we present an approach to augment these large language models with post-processing steps based on program analysis and synthesis techniques, that understand the syntax and semantics of programs. Further, we show that such techniques can make use of user feedback and improve with usage. We present our experiences from building and evaluating such a tool Jigsaw, targeted at synthesizing code for using Python Pandas API using multi-modal inputs. Our experience suggests that as these large language models evolve for synthesizing code from intent, Jigsaw has an important role to play in improving the accuracy of the systems.
引用
收藏
页码:1219 / 1231
页数:13
相关论文
共 45 条
  • [1] [Anonymous], PARENTHESIS BLOG
  • [2] [Anonymous], YOUR AI PAIR PROGR
  • [3] [Anonymous], About Us
  • [4] [Anonymous], PARENTHESIS STACKOVE
  • [5] AutoPandas: Neural-Backed Generators for Program Synthesis
    Bavishi, Rohan
    Lemieux, Caroline
    Fox, Roy
    Sen, Koushik
    Stoica, Ion
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (OOPSLA):
  • [6] Brown TB, 2020, ADV NEUR IN, V33
  • [7] Chen M., 2021, arXiv
  • [8] Web Question Answering with Neurosymbolic Program Synthesis
    Chen, Qiaochu
    Lamoreaux, Aaron
    Wang, Xinyu
    Durrett, Greg
    Bastani, Osbert
    Dillig, Isil
    [J]. PROCEEDINGS OF THE 42ND ACM SIGPLAN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '21), 2021, : 328 - 343
  • [9] Multi-modal Synthesis of Regular Expressions
    Chen, Qiaochu
    Wang, Xinyu
    Ye, Xi
    Durrett, Greg
    Dillig, Isil
    [J]. PROCEEDINGS OF THE 41ST ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '20), 2020, : 487 - 502
  • [10] Maximal Multi-layer Specification Synthesis
    Chen, Yanju
    Martins, Ruben
    Feng, Yu
    [J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 602 - 612