Efficient astronomical query processing using Spark

被引:7
作者
Brahem, Mariem [1 ]
Yeh, Laurent [1 ]
Zeitouni, Karine [1 ]
机构
[1] Paris Saclay Univ, Univ Versailles St Quentin, DAVID Lab, Versailles, France
来源
26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018) | 2018年
基金
欧盟地平线“2020”;
关键词
Astronomical Survey Data Management; Big Data; Query Processing; Spark Framework;
D O I
10.1145/3274895.3274942
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sky surveys represent a fundamental data source in astronomy. Today, these surveys are moving into a petascale regime produced by modern telescopes. Due to the exponential growth of astronomical data, there is a pressing need to provide efficient astronomical query processing. Our goal is to bridge the gap between existing distributed systems and high-level languages for astronomers. In this paper, we present efficient techniques for query processing of astronomical data using ASTROIDE. Our framework helps astronomers to take advantage of the richness of the astronomical data. The proposed model supports complex astronomical operators expressed using ADQL (Astronomical Data Query Language), an extension of SQL commonly used by astronomers. ASTROIDE proposes spatial indexing and partitioning techniques to better filter the data access. It also implements a query optimizer that injects spatial-aware optimization rules and strategies. Experimental evaluation based on real datasets demonstrates that the present framework is scalable and efficient.
引用
收藏
页码:229 / 238
页数:10
相关论文
共 19 条
  • [1] [Anonymous], 2006, ASP Conf. Ser. Vol. 351
  • [2] Spark SQL: Relational Data Processing in Spark
    Armbrust, Michael
    Xin, Reynold S.
    Lian, Cheng
    Huai, Yin
    Liu, Davies
    Bradley, Joseph K.
    Meng, Xiangrui
    Kaftan, Tomer
    Franklint, Michael J.
    Ghodsi, Ali
    Zaharia, Matei
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1383 - 1394
  • [3] HX-MATCH: In-Memory Cross-Matching Algorithm for Astronomical Big Data
    Brahem, Mariem
    Zeitouni, Karine
    Yeh, Laurent
    [J]. ADVANCES IN SPATIAL AND TEMPORAL DATABASES, SSTD 2017, 2017, 10411 : 411 - 415
  • [4] Brahem Mariem, 2016, THESIS, V3
  • [5] Eldawy A, 2015, PROC INT CONF DATA, P1352
  • [6] The Era of Big Spatial Data
    Eldawy, Ahmed
    Mokbel, Mohamed F.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (12): : 1992 - 1995
  • [7] HEALPix:: A framework for high-resolution discretization and fast analysis of data distributed on the sphere
    Górski, KM
    Hivon, E
    Banday, AJ
    Wandelt, BD
    Hansen, FK
    Reinecke, M
    Bartelmann, M
    [J]. ASTROPHYSICAL JOURNAL, 2005, 622 (02) : 759 - 771
  • [8] Benchmarking SQL on MapReduce systems using large astronomy databases
    Mesmoudi, Amin
    Hacid, Mohand-Said
    Toumani, Farouk
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (03) : 347 - 378
  • [9] Nieto-Santisteban M. A., 2007, NAT SCI TECHN COUNC
  • [10] Nishimura S, 2013, DISTRIB PARALLEL DAT, V31, P289, DOI 10.1007/s10619-012-7109-z