Spatial-Aware Approximate Big Data Stream Processing

被引:9
作者
Al Jawarneh, Isam Mashhour [1 ]
Bellavista, Paolo [1 ]
Foschini, Luca [1 ]
Montanari, Rebecca [1 ]
机构
[1] Univ Bologna, Dipartimento Informat Sci & Ingn, Viale Risorgimento 2, I-40136 Bologna, Italy
来源
2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) | 2019年
关键词
Spatial Sampling; Spark Streaming; Z-order curves; stratification; dimension reduction;
D O I
10.1109/globecom38437.2019.9014291
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The widespread adoption of ubiquitous loT edge devices and modern telemetry has generated an unprecedented avalanche of spatially -tagged datasets, which if could interactively be explored, would offer relevant insights into interesting natural phenomena. Online application of spatial queries is expensive, a problem that is further inflated by the fact that we, more than often, do not have access to a full dataset population in non-stationary settings. As a way of coping up, sampling stands out as a natural solution for approximating estimators such as averages and totals of some interesting correlated parameters. In any sampling design, representativeness remains the main issue upon which a method is regarded good or bad. In a loose way, in a spatial context, this means fairly sampling quantities in a way that preserves spatial characteristics so as to provide more accurate approximates for spatial query responses. Current big data management systems either do not offer over-the-counter spatial-aware online sampling solutions or, at best, rely on randomness, which causes too many imponderables for an overall estimation. We herein have designed a QoS-spatial-aware online sampling method that outperforms vanilla baselines by statically significant magnitudes. Our method sits atop Apache Spark Structured Streaming's codebase and have been tested against a benchmark that is consisting of millions-records of spatially-augmented dataset.
引用
收藏
页数:6
相关论文
共 15 条
  • [1] Al Jawarneh IM, 2018, IEEE INT WORKSH COMP, P86
  • [2] Al Jawarneh IM, 2018, IEEE SYMP COMP COMMU, P1227, DOI 10.1109/ISCC.2018.8538616
  • [3] Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark
    Armbrust, Michael
    Das, Tathagata
    Torres, Joseph
    Yavuz, Burak
    Zhu, Shixiong
    Xin, Reynold
    Ghodsi, Ali
    Stoica, Ion
    Zaharia, Matei
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 601 - 613
  • [4] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [5] Elkins BV, 1997, GEOTECH SP, P161
  • [6] A new kernel density estimator for accurate home-range and species-range area estimation
    Fleming, Christen H.
    Calabrese, Justin M.
    [J]. METHODS IN ECOLOGY AND EVOLUTION, 2017, 8 (05): : 571 - 579
  • [7] Spatially Balanced Sampling through the Pivotal Method
    Grafstrom, Anton
    Lundstrom, Niklas L. P.
    Schelin, Lina
    [J]. BIOMETRICS, 2012, 68 (02) : 514 - 520
  • [8] Lehman A., 2013, JMP for basic univariate and multivariate statistics: methods for researchers and social scientists
  • [9] Approximate Query Processing: What is New and Where to Go?: A Survey on Approximate Query Processing
    Li, Kaiyu
    Li, Guoliang
    [J]. DATA SCIENCE AND ENGINEERING, 2018, 3 (04) : 379 - 397
  • [10] Use of space-filling curves to select sample locations in natural resource monitoring studies
    Lister, Andrew J.
    Scott, Charles T.
    [J]. ENVIRONMENTAL MONITORING AND ASSESSMENT, 2009, 149 (1-4) : 71 - 80