Maximal Multi-layer Specification Synthesis

被引:22
作者
Chen, Yanju [1 ]
Martins, Ruben [2 ]
Feng, Yu [1 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING | 2019年
基金
美国国家科学基金会;
关键词
program synthesis; machine learning; neural networks; Max-SMT; SQL QUERIES; TRANSFORMATIONS;
D O I
10.1145/3338906.3338951
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There has been a significant interest in applying programming-by-example to automate repetitive and tedious tasks. However, due to the incomplete nature of input-output examples, a synthesizer may generate programs that pass the examples but do not match the user intent. In this paper, we propose Mars, a novel synthesis framework that takes as input a multi-layer specification composed by input-output examples, textual description, and partial code snippets that capture the user intent. To accurately capture the user intent from the noisy and ambiguous description, we propose a hybrid model that combines the power of an LSTM-based sequence-to-sequence model with the apriori algorithm for mining association rules through unsupervised learning. We reduce the problem of solving a multi-layer specification synthesis to a Max-SMT problem, where hard constraints encode well-typed concrete programs and soft constraints encode the user intent learned by the hybrid model. We instantiate our hybrid model to the data wrangling domain and compare its performance against MORPHEUS, a state-of-the-art synthesizer for data wrangling tasks. Our experiments demonstrate that our approach outperforms MORPHEUS in terms of running time and solved benchmarks. For challenging benchmarks, our approach can suggest candidates with rankings that are an order of magnitude better than MORPHEUS which leads to running times that are 15x faster than MORPHEUS.
引用
收藏
页码:602 / 612
页数:11
相关论文
共 32 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Agrawal R., 1994, INT C VER LARG DAT B, P487
  • [3] [Anonymous], P INT C LEARN REPR O
  • [4] [Anonymous], P INT C LEARN REPR O
  • [5] Balog M, 2017, P 5 INT C LEARN REPR
  • [6] Barman S, 2010, POPL'10: PROCEEDINGS OF THE 37TH ANNUAL ACM SIGPLAN-SIGACT SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES, P339
  • [7] Barman Shaon., 2015, 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!), Onward! 2015, P121
  • [8] Barowy DW, 2015, ACM SIGPLAN NOTICES, V50, P218, DOI [10.1145/2813885.2737952, 10.1145/2737924.2737952]
  • [9] Bjorner N., 2015, P 21 INT C TOOLS ALG, P194, DOI [DOI 10.1007/978-3-662-46681-014, 10.1007/978-3-662-46681-014]
  • [10] Dasu T., 2003, Exploratory Data Mining and Data Cleaning