SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
BIZNES INFORMATIKA-BUSINESS INFORMATICS | 2015年 / 33卷 / 03期
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [1] Big Data and Query Optimization Techniques
    Chugh, Aarti
    Sharma, Vivek Kumar
    Jain, Charu
    ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 337 - 345
  • [2] Adaptive correlation exploitation in big data query optimization
    Liu, Yuchen
    Liu, Hai
    Xiao, Dongqing
    Eltabakh, Mohamed Y.
    VLDB JOURNAL, 2018, 27 (06): : 873 - 898
  • [3] Big Data Normalization for Massively Parallel Processing Databases
    Golov, Nikolay
    Ronnback, Lars
    ADVANCES IN CONCEPTUAL MODELING, ER 2015 WORKSHOPS, 2015, 9382 : 154 - 163
  • [4] Big Data normalization for massively parallel processing databases
    Golov, Nikolay
    Ronnback, Lars
    COMPUTER STANDARDS & INTERFACES, 2017, 54 : 86 - 93
  • [5] Adaptive correlation exploitation in big data query optimization
    Yuchen Liu
    Hai Liu
    Dongqing Xiao
    Mohamed Y. Eltabakh
    The VLDB Journal, 2018, 27 : 873 - 898
  • [6] Research on Big Data Storage Structure and Query Optimization
    Zhang, Jinhai
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 1508 - 1511
  • [7] QUERY OPTIMIZATION IN MICROSOFT SQL SERVER
    Haxhijaha, Blerta
    Ajdari, Jaumin
    Raufi, Bujar
    Zenuni, Xhemal
    Ismaili, Florie
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2018, 10 (02): : 13 - 22
  • [8] A Review on Recent Trends in Query Processing and Optimization in Big Data
    Deepak Kumar
    Vijay Kumar Jha
    Wireless Personal Communications, 2022, 124 : 633 - 654
  • [9] A Review on Recent Trends in Query Processing and Optimization in Big Data
    Kumar, Deepak
    Jha, Vijay Kumar
    WIRELESS PERSONAL COMMUNICATIONS, 2022, 124 (01) : 633 - 654
  • [10] Federated Query processing for Big Data in Data Science
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6145 - 6147