Cut-and-Rewind: Extending Query Engine for Continuous Stream Analytics

被引:6
作者
Chen, Qiming [1 ]
Hsu, Meichun [1 ]
机构
[1] Hewlett Packard Corp, HP Labs, Palo Alto, CA 94304 USA
来源
TRANSACTIONS ON LARGE-SCALE DATA- AND KNOWLEDGE-CENTERED SYSTEMS XXI | 2015年 / 9260卷
关键词
D O I
10.1007/978-3-662-47804-2_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combining data warehousing and stream processing technologies has great potential in offering low-latency data-intensive analytics. Unfortunately, such convergence has not been properly addressed so far. The current generation of stream processing systems is in general built separately from the data warehouse and query engine, which can cause significant overhead in data access and data movement, and is unable to take advantage of the functionalities already offered by the existing data warehouse systems. In this work we tackle some hard problems in integrating stream analytics capability into the existing query engine. We define an extended SQL query model that unifies queries over both static relations and dynamic streaming data, and develop techniques to extend query engines to support the unified model. We propose the cut-and-rewind query execution model to allow a query with full SQL expressive power to be applied to stream data by converting the latter into a sequence of "chunks", and executing the query over each chunk sequentially, but without shutting the query instance down between chunks for continuously maintaining the application context across the execution cycles as required by sliding-window operators. We also propose the cycle-based transaction model to support Continuous Querying with Continuous Persisting (CQCP) with cycle-based isolation and visibility. We have prototyped our approach by extending the PostgreSQL. This work has resulted in a new kind of tightly integrated, highly efficient system with the advanced stream processing capability as well as the full DBMS functionality. We demonstrate the system with the popular Linear Road benchmark, and report the performance. By leveraging the matured code base of a query engine to the maximal extent, we can significantly reduce the engineering investment needed for developing the streaming technology. Providing this capability on proprietary parallel analytics engine is work in progress.
引用
收藏
页码:94 / 114
页数:21
相关论文
共 20 条
[1]   Aurora: a new model and architecture for data stream management [J].
Abadi, DJ ;
Carney, D ;
Cetintemel, U ;
Cherniack, M ;
Convey, C ;
Lee, S ;
Stonebraker, M ;
Tatbul, N ;
Zdonik, S .
VLDB JOURNAL, 2003, 12 (02) :120-139
[2]  
[Anonymous], 2010, DOLAP 2010
[3]  
[Anonymous], 2003, SIGMOD
[4]  
[Anonymous], 2005, CIDR
[5]   The CQL continuous query language: semantic foundations and query execution [J].
Arasu, A ;
Babu, S ;
Widom, J .
VLDB JOURNAL, 2006, 15 (02) :121-142
[6]  
BRYANT RE, 2007, CMUCS07128
[7]   SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [J].
Chaiken, Ronnie ;
Jenkins, Bob ;
Larson, Per-Ake ;
Ramsey, Bill ;
Shakib, Darren ;
Weaver, Simon ;
Zhou, Jingren .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02) :1265-1276
[8]  
Chandrasekaran S., 2003, CIDR
[9]  
Chen J., 2000, SIGMOD
[10]  
Chen QM, 2009, LECT NOTES COMPUT SC, V5870, P389, DOI 10.1007/978-3-642-05148-7_28