SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引:0
|
作者
Golov, Nikolay I. [1 ]
Ronnback, Lars [2 ]
机构
[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia
[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden
来源
BIZNES INFORMATIKA-BUSINESS INFORMATICS | 2015年 / 33卷 / 03期
关键词
Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;
D O I
暂无
中图分类号
F [经济];
学科分类号
02 ;
摘要
This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.
引用
收藏
页码:7 / 14
页数:8
相关论文
共 50 条
  • [41] Optimization of artificial intelligence in localized big data real-time query processing task scheduling algorithm
    Sun, Maojin
    Sun, Luyi
    FRONTIERS IN PHYSICS, 2024, 12
  • [42] An efficient query optimization technique in big data using σ-ANFIS load balancer and CaM-BW optimizer
    Kumar, Deepak
    Jha, Vijay Kumar
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13018 - 13045
  • [43] Enterprise Performance Management Optimization Based on Big Data
    Ding, Wenhui
    APPLICATIONS OF DECISION SCIENCE IN MANAGEMENT, ICDSM 2022, 2023, 260 : 3 - 10
  • [44] Performance Comparison of Index Schemes for Range Query of Big Data
    Qin, Xiongpai
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1469 - 1473
  • [45] Analyzing SQL payloads using logistic regression in a big data environment
    Shareef, Omar Salah F.
    Hasan, Rehab Flaih
    Farhan, Ammar Hatem
    JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [46] Optimization for massive data query method in database
    Xie Xiaodong
    Zou Jinpin
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 1523 - 1526
  • [47] Optimization of query plan in data stream system
    Lin, Anxian
    Zhen, Zhanping
    DCABES 2006 Proceedings, Vols 1 and 2, 2006, : 630 - 633
  • [48] Detection of SQL Injection Attacks by Removing the Parameter Values of SQL Query
    Katole, Rajashree A.
    Sherekar, Swati S.
    Thakare, Vilas M.
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 736 - 741
  • [49] LotusSQL: SQL Engine for High-Performance Big Data Systems
    Li, Xiaohan
    Yu, Bowen
    Feng, Guanyu
    Wang, Haojie
    Chen, Wenguang
    BIG DATA MINING AND ANALYTICS, 2021, 4 (04): : 252 - 265
  • [50] Query Processing Techniques for Big Spatial-Keyword Data
    Mahmood, Ahmed
    Aref, Walid G.
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1777 - 1782