SWARM: Adaptive Load Balancing in Distributed Streaming Systems for Big Spatial Data

被引:4
作者
Daghistani, Anas [1 ,2 ]
Aref, Walid G. [2 ,3 ]
Ghafoor, Arif [2 ]
Mahmood, Ahmed R. [2 ]
机构
[1] Umm Al Qura Univ, Mecca, Saudi Arabia
[2] Purdue Univ, W Lafayette, IN 47907 USA
[3] Alexandria Univ, Alexandria, Egypt
基金
美国国家科学基金会;
关键词
Load balancing; distributed streaming systems; spatial stream processing; cluster utilization; spatial continuous queries; SELECTIVITY ESTIMATION;
D O I
10.1145/3460013
中图分类号
TP7 [遥感技术];
学科分类号
081102 ; 0816 ; 081602 ; 083002 ; 1404 ;
摘要
The proliferation of GPS-enabled devices has led to the development of numerous location-based services. These services need to process massive amounts of streamed spatial data in real-time. The current scale of spatial data cannot be handled using centralized systems. This has led to the development of distributed spatial streaming systems. Existing systems are using static spatial partitioning to distribute the workload. In contrast, the real-time streamed spatial data follows non-uniform spatial distributions that are continuously changing over time. Distributed spatial streaming systems need to react to the changes in the distribution of spatial data and queries. This article introduces SWARM, a lightweight adaptivity protocol that continuously monitors the data and query workloads across the distributed processes of the spatial data streaming system and redistributes and rebalances the workloads as soon as performance bottlenecks get detected. SWARM is able to handle multiple query-execution and data-persistence models. A distributed streaming system can directly use SWARM to adaptively rebalance the system's workload among its machines with minimal changes to the original code of the underlying spatial application. Extensive experimental evaluation using real and synthetic datasets illustrate that, on average, SWARM achieves 2x improvement in throughput over a static grid partitioning that is determined based on observing a limited history of the data and query workloads. Moreover, SWARM reduces execution latency on average 4x compared with the other technique.
引用
收藏
页数:43
相关论文
共 58 条
[1]  
Abdelhamid AS, 2016, PROC INT CONF DATA, P1406, DOI 10.1109/ICDE.2016.7498356
[2]   Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce [J].
Aji, Ablimit ;
Wang, Fusheng ;
Vo, Hoang ;
Lee, Rubao ;
Liu, Qiaoling ;
Zhang, Xiaodong ;
Saltz, Joel .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11) :1009-1020
[3]  
Akidau T, 2015, PROC VLDB ENDOW, V8, P1792
[4]   AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data [J].
Aly, Ahmed M. ;
Mahmood, Ahmed R. ;
Hassan, Mohamed S. ;
Aref, Walid G. ;
Ouzzani, Mourad ;
Elmeleegy, Hazem ;
Qadah, Thamir .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (13) :2062-2073
[5]   Kangaroo: Workload-Aware Processing of Range Data and Range Queries in Hadoop [J].
Aly, Ahmed M. ;
Elmeleegy, Hazem ;
Qi, Yan ;
Aref, Walid .
PROCEEDINGS OF THE NINTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'16), 2016, :397-406
[6]   M3: Stream Processing on Main-Memory MapReduce [J].
Aly, Ahmed M. ;
Sallam, Asmaa ;
Gnanasekaran, Bala M. ;
Long-Van Nguyen-Dinh ;
Aref, Walid G. ;
Ouzzani, Mourad ;
Ghafoor, Arif .
2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, :1253-1256
[7]  
An N, 2001, PROC INT CONF DATA, P368, DOI 10.1109/ICDE.2001.914849
[8]  
[Anonymous], 2021, Internet Live Stats
[9]  
Apache Hadoop, 2021, About us
[10]  
Apache Samza, 2021, US