Scalable Spatial Analytics and In Situ Query Processing in DaskDB

被引:0
作者
Das, Suvam Kumar [1 ]
Peter, Ronnit [1 ]
Ray, Suprio [1 ]
机构
[1] Univ New Brunswick, Fredericton, NB, Canada
来源
PROCEEDINGS OF 2023 18TH INTERNATIONAL SYMPOSIUM ON SPATIAL AND TEMPORAL DATA, SSTD 2023 | 2023年
关键词
data science; data analytics; in situ query processing; learned index;
D O I
10.1145/3609956.3609978
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vast amounts of data are stored in raw data files. Data scientists and practitioners typically use data science frameworks for data analysis on raw data. Among them, Python Pandas library is one of the most popular language-based frameworks. On the other hand, relational databases (RDBMSs) are still widely used for SQL query execution. Before querying, raw data must be loaded into RDBMSs through an ETL process. Conversely, data stored in RDBMSs may need to be exported out or moved into a suitable format to perform complex data analysis. This movement of data adversely affects the time-to-insight. Recently a scalable system, called DaskDB, was introduced, which supports unified data analytics and in situ SQL query processing without requiring any data movement. It supports invoking existing Python API's as User-Defined Functions (UDF) as a part of SQL queries, so they can be easily integrated with most of the existing Python applications. Due to the importance of supporting spatial analytics and spatial SQL queries, we have extended DaskDB to support spatial functionalities. In this paper, we present our enhanced DaskDB system. With two real-world spatial datasets, we demonstrate the scalability of DaskDB's spatial features.
引用
收藏
页码:189 / 193
页数:5
相关论文
共 10 条
  • [1] A Performance Study of Big Spatial Data Systems
    Alam, Md Mahbub
    Ray, Suprio
    Bhavsar, Virendra C.
    [J]. BIGSPATIAL 2018: PROCEEDINGS OF THE 7TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA (BIGSPATIAL-2018), 2018, : 1 - 9
  • [2] Dask-GeoPandas, Dask-GeoPandas
  • [3] GeoNB, NB Dataset
  • [4] GeoPandas, GeoPandas
  • [5] Ray S, 2011, PROC INT CONF DATA, P1139, DOI 10.1109/ICDE.2011.5767929
  • [6] Rocklin M., 2015, P 14 PYTH SCI C, P130, DOI [DOI 10.25080/MAJORA-7B98E3ED-013, 10.25080/majora-7b98-3ed-013, DOI 10.25080/MAJORA-7B98-3ED-013]
  • [7] Sedona, Apache Sedona
  • [8] SQLAlchemy, SQLAlchemy
  • [9] TIGER, TIGER Dataset
  • [10] DaskDB: Scalable Data Science with Unified Data Analytics and In Situ Query Processing
    Watson, Alex
    Das, Suvam Kumar
    Ray, Suprio
    [J]. 2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,