A comparative study of cluster-based Big Data Cube implementations

被引:3
|
作者
Morielo Caetano, Andre Francisco [1 ]
Hirata, Celso Massaki [1 ]
Silva, Rodrigo Rocha [2 ,3 ,4 ]
机构
[1] Inst Tecnol Aeronout, Marechal Eduardo Gomes Sq 50, Sao Jose Dos Campos, Brazil
[2] Fac Tecnol Estado Sao Paulo, Carlos Barattino St 908, Mogi Das Cruzes, SP, Brazil
[3] Univ Coimbra, Paula Souza Ctr, Polo 2 Pinhal Marrocos, Coimbra, Portugal
[4] Univ Coimbra, Ctr Informat & Syst, Dept Informat Engn, Polo 2 Pinhal Marrocos, Coimbra, Portugal
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 133卷
关键词
Datacube; OLAP; Cloud; Big Data; Survey; Distributed; Parallel; COMPUTATION; SPARK; MPI;
D O I
10.1016/j.future.2022.03.024
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Research on Data Cubes scalability is extensive, yet sparse. Scalable design patterns for Data Cube implementations are a trend as the technology shifts from centralized and fully materialized models to distributed and partially materialized ones. The implementations explore cheaper and upgraded hardware in clusters of computer nodes. It is a common understanding that the parallel and distributed hardware enables to handle large amounts of multidimensional data for online analytical processing, up to billions of tuples or more, with increased performance and fault tolerance. However, the number of research works and their heterogeneity may overwhelm new initiatives in this field, as there is little discussion regarding the state-of-the-art and ways for improvement. Moreover, the baseline for comparison in most works is often too limited and requires that the reader crosscheck the information among many articles to identify possible gaps. In order to help identifying these gaps, we analyzed the works on Data Cube scalability and elaborated a comparative study that provides directions for new research on the parallel and distributed implementations of data cubes. We identified some features for comparison that include cube function, implementation technology, cube storage type, and various experiments information. We expect that the features and comparisons help researchers to identify research gaps and pave ways for future works on the field. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:240 / 253
页数:14
相关论文
共 50 条
  • [1] Cluster-based data filtering for manufacturing big data systems
    Li, Yifu
    Deng, Xinwei
    Ba, Shan
    Myers, William R.
    Brenneman, William A.
    Lange, Steve J.
    Zink, Ron
    Jin, Ran
    JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (03) : 290 - 302
  • [2] ROLAP implementations of the data cube
    Morfonios, Konstantinos
    Konakas, Stratis
    Ioannidis, Yannis
    Kotsis, Nikolaos
    ACM COMPUTING SURVEYS, 2007, 39 (04)
  • [3] Computational Performance Analysis of Cluster-based Technologies for Big Data Analytics
    Khan, Mukhtakj
    Salman
    Iqbal, Nadeem
    2017 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2017, : 280 - 286
  • [4] A Cluster-Based Data Fusion Technique to Analyze Big Data in Wireless Multi-Sensor System
    Din, Sadia
    Ahmad, Awais
    Paul, Anand
    Rathore, Muhammad Mazhar Ullah
    Jeon, Gwanggil
    IEEE ACCESS, 2017, 5 : 5069 - 5083
  • [5] LandQvl: A GIS cluster-based management information system for arable land quality big data
    Yao, Xiaochuang
    Yang, Jianyu
    Li, Lin
    Yun, Wenju
    Zhao, Zuliang
    Ye, Sijing
    Zhu, Dehai
    2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 343 - 348
  • [6] A New Big Data Model Using Distributed Cluster-Based Resampling for Class-Imbalance Problem
    Terzi, Duygu Sinanc
    Sagiroglu, Seref
    APPLIED COMPUTER SYSTEMS, 2019, 24 (02) : 104 - 110
  • [7] Big high-dimension data cube designs for hybrid memory systems
    Silva, Rodrigo Rocha
    Hirata, Celso Massaki
    Lima, Joubert de Castro
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (12) : 4717 - 4746
  • [8] Performance Study of Distributed Big Data Analysis in YARN Cluster
    Ahn, HooYoung
    Kim, Hyunjae
    You, WoongShik
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1261 - 1266
  • [9] Comparative Study of Big Data Classification Algorithm Based on SVM
    Zou, Huasheng
    Jin, Zhiyuan
    2018 CROSS STRAIT QUAD-REGIONAL RADIO SCIENCE AND WIRELESS TECHNOLOGY CONFERENCE (CSQRWC), 2018,
  • [10] Hadoop-Based Big Data Distributions: A Comparative Study
    Hamdaoui, Ikram
    El Fissaoui, Mohamed
    El Makkaoui, Khalid
    El Allali, Zakaria
    EMERGING TRENDS IN INTELLIGENT SYSTEMS & NETWORK SECURITY, 2023, 147 : 242 - 252