Top-k spatial-keyword publish/subscribe over sliding window

被引:23
作者
Wang, Xiang [1 ]
Zhang, Wenjie [1 ]
Zhang, Ying [2 ]
Lin, Xuemin [1 ]
Huang, Zengfeng [1 ]
机构
[1] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
[2] Univ Technol, Ctr Artificial Intelligence, Sydney, NSW, Australia
基金
澳大利亚研究理事会;
关键词
Publish/subscribe system; Top-k spatial-keyword queries; Stream; Sliding window; Distributed processing;
D O I
10.1007/s00778-016-0453-2
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the prevalence of social media and GPS-enabled devices, a massive amount of geo-textual data have been generated in a stream fashion, leading to a variety of applications such as location-based recommendation and information dissemination. In this paper, we investigate a novel real-time top- monitoring problem over sliding window of streaming data; that is, we continuously maintain the top-k most relevant geo-textual messages (e.g., geo-tagged tweets) for a large number of spatial-keyword subscriptions (e.g., registered users interested in local events) simultaneously. To provide the most recent information under controllable memory cost, sliding window model is employed on the streaming geo-textual data. To the best of our knowledge, this is the first work to study top- spatial-keyword publish/subscribe over sliding window. A novel centralized system, called Skype (Top-k Spatial-keyword Publish/Subscribe), is proposed in this paper. In Skype, to continuously maintain top- results for massive subscriptions, we devise a novel indexing structure upon subscriptions such that each incoming message can be immediately delivered on its arrival. To reduce the expensive top- re-evaluation cost triggered by message expiration, we develop a novel cost-based k -skyband technique to reduce the number of re-evaluations in a cost-effective way. Extensive experiments verify the great efficiency and effectiveness of our proposed techniques. Furthermore, to support better scalability and higher throughput, we propose a distributed version of Skype, namely DSkype, on top of Storm, which is a popular distributed stream processing system. With the help of fine-tuned subscription/message distribution mechanisms, DSkype can achieve orders of magnitude speed-up than its centralized version.
引用
收藏
页码:301 / 326
页数:26
相关论文
共 45 条
[1]   Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [J].
Aji, Ablimit ;
Wang, Fusheng ;
Vo, Hoang ;
Lee, Rubao ;
Liu, Qiaoling ;
Zhang, Xiaodong ;
Saltz, Joel .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :1009-1020
[2]   AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data [J].
Aly, Ahmed M. ;
Mahmood, Ahmed R. ;
Hassan, Mohamed S. ;
Aref, Walid G. ;
Ouzzani, Mourad ;
Elmeleegy, Hazem ;
Qadah, Thamir .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (13) :2062-2073
[3]  
[Anonymous], 2003, NONLINEAR PROGRAMMIN
[4]  
[Anonymous], 2008, INTRO PROBABILITY TH
[5]  
[Anonymous], 2009, P VLDB ENDOWMENT
[6]  
[Anonymous], 2008, Introduction to information retrieval
[7]  
Babcock B., 2002, PODS, P1, DOI [DOI 10.1145/543613.543615, 10.1145/543613.543615]
[8]  
Bayardo R.J., 2007, WWW, P131, DOI [DOI 10.1145/1242572.1242591, 10.1145/1242572.1242591]
[9]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[10]  
Böhm C, 2007, PROC INT CONF DATA, P131