FIDEX: Filtering Spreadsheet Data using Examples

被引:24
作者
Wang, Xinyu [1 ]
Gulwani, Sumit [2 ]
Singh, Rishabh [2 ]
机构
[1] UT Austin, Austin, TX 78712 USA
[2] Microsoft Res, Redmond, WA USA
关键词
Program Synthesis; Data Filtering; Regular Expressions; Programming By Examples; Algorithms; Human Factor; QUERIES;
D O I
10.1145/3022671.2984030
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Data filtering in spreadsheets is a common problem faced by millions of end-users. The task of data filtering requires a computational model that can separate intended positive and negative string instances. We present a system, FIDEX, that can efficiently learn desired data filtering expressions from a small set of positive and negative string examples. There are two key ideas of our approach. First, we design an expressive DSL to represent disjunctive filter expressions needed for several real-world data filtering tasks. Second, we develop an efficient synthesis algorithm for incrementally learning consistent filter expressions in the DSL from very few positive and negative examples. A DAG-based data structure is used to succinctly represent a large number of filter expressions, and two corresponding operators are defined for algorithmically handling positive and negative examples, namely, the intersection and subtraction operators. FIDEX is able to learn data filters for 452 out of 460 real-world data filtering tasks in real time (0.22s), using only 2.2 positive string instances and 2:7 negative string instances on average.
引用
收藏
页码:195 / 213
页数:19
相关论文
共 42 条
[1]  
Alquezar R., 1994, P ACL 02 WORKSH UNS, P291
[2]   Syntax-Guided Synthesis [J].
Alur, Rajeev ;
Bodik, Rastislav ;
Dallal, Eric ;
Fisman, Dana ;
Garg, Pranav ;
Juniwal, Garvit ;
Kress-Gazit, Hadas ;
Madhusudan, P. ;
Martin, Milo M. K. ;
Raghothaman, Mukund ;
Saha, Shamwaditya ;
Seshia, Sanjit A. ;
Singh, Rishabh ;
Solar-Lezama, Armando ;
Torlak, Emina ;
Udupa, Abhishek .
DEPENDABLE SOFTWARE SYSTEMS ENGINEERING, 2015, 40 :1-25
[3]   A NOTE ON THE NUMBER OF QUERIES NEEDED TO IDENTIFY REGULAR LANGUAGES [J].
ANGLUIN, D .
INFORMATION AND CONTROL, 1981, 51 (01) :76-87
[4]   COMPLEXITY OF MINIMUM INFERENCE OF REGULAR SETS [J].
ANGLUIN, D .
INFORMATION AND CONTROL, 1978, 39 (03) :337-350
[5]   LEARNING REGULAR SETS FROM QUERIES AND COUNTEREXAMPLES [J].
ANGLUIN, D .
INFORMATION AND COMPUTATION, 1987, 75 (02) :87-106
[6]  
[Anonymous], THESIS
[7]  
[Anonymous], 2003, Exploratory Data Mining and Data Cleaning
[8]  
[Anonymous], PLDI
[9]  
[Anonymous], 1992, S MACH PERC
[10]  
Barowy DW, 2015, ACM SIGPLAN NOTICES, V50, P218, DOI [10.1145/2737924.2737952, 10.1145/2813885.2737952]