SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引：0

作者：

Golov, Nikolay I. ^{[1
]}

Ronnback, Lars ^{[2
]}

机构：

[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia

[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden

来源：

BIZNES INFORMATIKA-BUSINESS INFORMATICS | 2015年 / 33卷 / 03期

关键词：

Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;

D O I：

暂无

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.

引用

页码：7 / 14

页数：8

共 50 条

[41] Optimization of artificial intelligence in localized big data real-time query processing task scheduling algorithm
Sun, Maojin
Sun, Luyi
FRONTIERS IN PHYSICS, 2024, 12
[42] An efficient query optimization technique in big data using σ-ANFIS load balancer and CaM-BW optimizer
Kumar, Deepak
Jha, Vijay Kumar
JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13018 - 13045
[43] Enterprise Performance Management Optimization Based on Big Data
Ding, Wenhui
APPLICATIONS OF DECISION SCIENCE IN MANAGEMENT, ICDSM 2022, 2023, 260 : 3 - 10
[44] Performance Comparison of Index Schemes for Range Query of Big Data
Qin, Xiongpai
2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1469 - 1473
[45] Analyzing SQL payloads using logistic regression in a big data environment
Shareef, Omar Salah F.
Hasan, Rehab Flaih
Farhan, Ammar Hatem
JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
[46] Optimization for massive data query method in database
Xie Xiaodong
Zou Jinpin
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 1523 - 1526
[47] Optimization of query plan in data stream system
Lin, Anxian
Zhen, Zhanping
DCABES 2006 Proceedings, Vols 1 and 2, 2006, : 630 - 633
[48] Detection of SQL Injection Attacks by Removing the Parameter Values of SQL Query
Katole, Rajashree A.
Sherekar, Swati S.
Thakare, Vilas M.
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 736 - 741
[49] LotusSQL: SQL Engine for High-Performance Big Data Systems
Li, Xiaohan
Yu, Bowen
Feng, Guanyu
Wang, Haojie
Chen, Wenguang
BIG DATA MINING AND ANALYTICS, 2021, 4 (04): : 252 - 265
[50] Query Processing Techniques for Big Spatial-Keyword Data
Mahmood, Ahmed
Aref, Walid G.
SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1777 - 1782

← 1 2 3 4 5 →