SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
BIZNES INFORMATIKA-BUSINESS INFORMATICS | 2015年 / 33卷 / 03期
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [31] Banian: A Cross-Platform Interactive Query System for Structured Big Data
    Xu, Tao
    Wang, Dongsheng
    Liu, Guodong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2015, 20 (01) : 62 - 71
  • [32] Designing Query Optimizers for Big Data Problems of The Future
    Tran, Nga
    Bodagala, Sreenath
    Dave, Jaimin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (11): : 1168 - 1169
  • [33] A learned cost model for big data query processing
    Li, Yan
    Wang, Liwei
    Wang, Sheng
    Sun, Yuan
    Zheng, Bolong
    Peng, Zhiyong
    INFORMATION SCIENCES, 2024, 670
  • [34] Distributed Join Query Processing for Big RDF Data
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Fakherldin, Mohammed
    Hashem, Ibrahim Abaker Targio
    ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7758 - 7761
  • [35] Loop Query of Big Data with Low Transmission Cost
    Ma Yan
    Chen Yufeng
    Kong Gang
    Chen Suhong
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MECHANICAL SCIENCE AND ENGINEERING, 2016, 66
  • [36] A Solution to Query Processing Challenges Through Smart Query Processor for Big Data Analytics
    Vaidya G.M.
    Kshirsagar M.M.
    SN Computer Science, 4 (2)
  • [37] Forecasting SQL Query Cost at Twitter
    Tang, Chunxu
    Wang, Beinan
    Luo, Zhenxiao
    Wu, Huijun
    Dasan, Shajan
    Fu, Maosong
    Li, Yao
    Ghosh, Mainak
    Kabra, Ruchin
    Navadiya, Nikhil Kantibhai
    Cheng, Da
    Dai, Fred
    Channapattan, Vrushali
    Mishra, Prachi
    2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 154 - 160
  • [38] The necessary optimization of the data lifecycle: Marine geosciences in the big data era
    Lee, Taylor R. R.
    Phrampus, Benjamin J. J.
    Obelcz, Jeffrey
    FRONTIERS IN EARTH SCIENCE, 2023, 10
  • [39] Comparative Study of Multi-query Optimization Techniques using Shared Predicate-based for Big Data
    Sahal, Radhya
    Khafagy, Mohamed H.
    Omara, Fatma A.
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (05): : 229 - 240
  • [40] Exploiting coarse-grained reused-based opportunities in Big Data multi-query optimization
    Sahal, Radhya
    Khafagy, Mohamed H.
    Omara, Fatma A.
    JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 26 : 432 - 452