Atrak: a MapReduce-based data warehouse for big data

被引:0
作者
Mohammadhossein Barkhordari
Mahdi Niamanesh
机构
[1] Advance Information System Research Group for Information and Communication Technology Research Centre,
来源
The Journal of Supercomputing | 2017年 / 73卷
关键词
Big data; MapReduce; Data warehouse; Data locality;
D O I
暂无
中图分类号
学科分类号
摘要
As warehouse data volumes expand, single-node solutions can no longer analyze the immense volume of data. Therefore, it is necessary to use shared nothing architectures such as MapReduce. Inter-node data segmentation in MapReduce creates node connectivity issues, network congestion, improper use of node memory capacity and inefficient processing power. In addition, it is not possible to change dimensions and measures without changing previously stored data and big dimension management. In this paper, a method called Atrak is proposed, which uses a unified data format to make Mapper nodes independent to solve the data management problem mentioned earlier. The proposed method can be applied to star schema data warehouse models with distributive measures. Atrak increases query execution speed by employing node independence and the proper use of MapReduce. The proposed method was compared to established methods such as Hive, Spark-SQL, HadoopDB and Flink. Simulation results confirm improved query execution speed of the proposed method. Using data unification in MapReduce can be used in other fields, such as data mining and graph processing.
引用
收藏
页码:4596 / 4610
页数:14
相关论文
共 16 条
[1]  
Dean J(2008)MapReduce: simplified data processing on large clusters Commun ACM 51 107-113
[2]  
Ghemawat S(2014)A survey of large-scale analytical query processing in MapReduce VLDB J 23 355-380
[3]  
Doulkeridis C(2011)CoHadoop: flexible data placement and its exploitation in Hadoop Proc VLDB Endow 4.9 575-585
[4]  
Nørvåg K(2010)Cheetah: a high performance, custom data warehouse on top of MapReduce Proc VLDB Endow 3 1459-1468
[5]  
Eltabakh MY(2011)Column-oriented storage techniques for MapReduce Proc VLDB Endow 4.7 419-429
[6]  
Chen S(2010)MRShare: sharing across multiple queries in MapReduce Proc VLDB Endow 3.1–2 494-505
[7]  
Floratou A(2012)ReStore: reusing results of MapReduce jobs Proc VLD B Endow 5.6 586-597
[8]  
Nykiel T(2010)Hadoop++: making a yellow elephant run like a cheetah (without it even noticing) Proc VLDB Endow 3.1–2 515-529
[9]  
Elghandour I(2012)Only aggressive elephants are fast elephants Proc Endow 5.11 1591-16902
[10]  
Aboulnaba A(2009)HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads Proc VLDB Endow 2.1 922-933