SQL QUERY OPTIMIZATION FOR HIGHLY NORMALIZED BIG DATA

被引：0

作者：

Golov, Nikolay I. ^{[1
]}

Ronnback, Lars ^{[2
]}

机构：

[1] Natl Res Univ, Fac Business & Management, Sch Business Informat, Higher Sch Econ,Dept Business Analyt, 20 Myasnitskaya St, Moscow 101000, Russia

[2] Stocholm Univ, Dept Comp Sci, SE-10691 Stockholm, Sweden

来源：

BIZNES INFORMATIKA-BUSINESS INFORMATICS | 2015年 / 33卷 / 03期

关键词：

Big Data; massively parallel processing (MPP); database; normalization; analytics; ad-hoc; querying; modeling; performance;

D O I：

暂无

中图分类号：

F [经济];

学科分类号：

02 ;

摘要：

This paper describes an approach for fast ad-hoc analysis of Big Data inside a relational data model. The approach strives to achieve maximal utilization of highly normalized temporary tables through the merge join algorithm. It is designed for the Anchor modeling technique, which requires a very high level of table normalization. Anchor modeling is a novel data warehouse modeling technique, designed for classical databases and adapted by the authors of the article for Big Data environment and a massively parallel processing (MPP) database. Anchor modeling provides flexibility and high speed of data loading, where the presented approach adds support for fast ad-hoc analysis of Big Data sets (tens of terabytes). Different approaches to query plan optimization are described and estimated, for row-based and column-based databases. Theoretical estimations and results of real data experiments carried out in a column-based MPP environment (HP Vertica) are presented and compared. The results show that the approach is particularly favorable when the available RAM resources are scarce, so that a switch is made from pure in-memory processing to spilling over from hard disk, while executing ad-hoc queries. Scaling is also investigated by running the same analysis on different numbers of nodes in the MPP cluster. Configurations of five, ten and twelve nodes were tested, using click stream data of Avito, the biggest classified site in Russia.

引用

页码：7 / 14

页数：8

共 50 条

[1] Big Data and Query Optimization Techniques
Chugh, Aarti
Sharma, Vivek Kumar
Jain, Charu
ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 337 - 345
[2] Adaptive correlation exploitation in big data query optimization
Liu, Yuchen
Liu, Hai
Xiao, Dongqing
Eltabakh, Mohamed Y.
VLDB JOURNAL, 2018, 27 (06): : 873 - 898
[3] Big Data Normalization for Massively Parallel Processing Databases
Golov, Nikolay
Ronnback, Lars
ADVANCES IN CONCEPTUAL MODELING, ER 2015 WORKSHOPS, 2015, 9382 : 154 - 163
[4] Big Data normalization for massively parallel processing databases
Golov, Nikolay
Ronnback, Lars
COMPUTER STANDARDS & INTERFACES, 2017, 54 : 86 - 93
[5] Adaptive correlation exploitation in big data query optimization
Yuchen Liu
Hai Liu
Dongqing Xiao
Mohamed Y. Eltabakh
The VLDB Journal, 2018, 27 : 873 - 898
[6] Research on Big Data Storage Structure and Query Optimization
Zhang, Jinhai
2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 1508 - 1511
[7] QUERY OPTIMIZATION IN MICROSOFT SQL SERVER
Haxhijaha, Blerta
Ajdari, Jaumin
Raufi, Bujar
Zenuni, Xhemal
Ismaili, Florie
INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2018, 10 (02): : 13 - 22
[8] A Review on Recent Trends in Query Processing and Optimization in Big Data
Deepak Kumar
Vijay Kumar Jha
Wireless Personal Communications, 2022, 124 : 633 - 654
[9] A Review on Recent Trends in Query Processing and Optimization in Big Data
Kumar, Deepak
Jha, Vijay Kumar
WIRELESS PERSONAL COMMUNICATIONS, 2022, 124 (01) : 633 - 654
[10] Federated Query processing for Big Data in Data Science
Muniswamaiah, Manoj
Agerwala, Tilak
Tappert, Charles C.
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6145 - 6147

← 1 2 3 4 5 →