Optimizing Interactive Development of Data-Intensive Applications

被引:10
作者
Interlandi, Matteo [1 ]
Tetali, Sai Deep [1 ,2 ]
Gulzar, Muhammad Ali [1 ]
Noor, Joseph [1 ]
Condie, Tyson [1 ]
Kim, Miryung [1 ]
Millstein, Todd [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90024 USA
[2] Google Inc, Menlo Pk, CA USA
来源
PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016) | 2016年
关键词
Query Rewriting; Incremental Evaluation; Spark; Interactive Development; Big Data;
D O I
10.1145/2987550.2987565
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern Data-Intensive Scalable Computing (DISC) systems are designed to process data through batch jobs that execute programs (e.g., queries) compiled from a high-level language. These programs are often developed interactively by posing ad-hoc queries over the base data until a desired result is generated. We observe that there can be significant overlap in the structure of these queries used to derive the final program. Yet, each successive execution of a slightly modified query is performed anew, which can significantly increase the development cycle. VEGA is an Apache Spark framework that we have implemented for optimizing a series of similar Spark programs, likely originating from a development or exploratory data analysis session. Spark developers (e.g., data scientists) can leverage VEGA to significantly reduce the amount of time it takes to re-execute a modified Spark program, reducing the overall time to market for their Big Data applications.
引用
收藏
页码:510 / 522
页数:13
相关论文
共 28 条
[1]  
Agrawal P, 2009, ACM SIGMOD/PODS 2009 CONFERENCE, P179
[2]  
[Anonymous], 2011, Proc. of Fifth Biennial Conference on Innovative Data Systems Research (CIDR)
[3]  
[Anonymous], 2010, P USENIX S OP SYST D
[4]  
[Anonymous], 2014, SOCC 14
[5]   Spark SQL: Relational Data Processing in Spark [J].
Armbrust, Michael ;
Xin, Reynold S. ;
Lian, Cheng ;
Huai, Yin ;
Liu, Davies ;
Bradley, Joseph K. ;
Meng, Xiangrui ;
Kaftan, Tomer ;
Franklint, Michael J. ;
Ghodsi, Ali ;
Zaharia, Matei .
SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, :1383-1394
[6]  
Bhatotia P., 2011, SOCC 2011
[7]  
Bu YY, 2010, PROC VLDB ENDOW, V3, P285
[8]  
CERI S, 1991, PROC INT CONF VERY L, P577
[9]   Spinning Fast Iterative Data Flows [J].
Ewen, Stephan ;
Tzoumas, Kostas ;
Kaufmann, Moritz ;
Markl, Volker .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (11) :1268-1279
[10]  
Graefe G., 1993, Proceedings. Ninth International Conference on Data Engineering (Cat. No.92CH3258-1), P209, DOI 10.1109/ICDE.1993.344061