Atrak: a MapReduce-based data warehouse for big data

被引:7
作者
Barkhordari, Mohammadhossein [1 ]
Niamanesh, Mahdi [1 ]
机构
[1] Coll Intersect, Adv Informat Syst Res Grp Informat & Commun Techn, Res Ctr, 5 Saeedialley,Enghelab St, Tehran 1599616313, Iran
关键词
Big data; MapReduce; Data warehouse; Data locality;
D O I
10.1007/s11227-017-2037-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As warehouse data volumes expand, single-node solutions can no longer analyze the immense volume of data. Therefore, it is necessary to use shared nothing architectures such as MapReduce. Inter-node data segmentation in MapReduce creates node connectivity issues, network congestion, improper use of node memory capacity and inefficient processing power. In addition, it is not possible to change dimensions and measures without changing previously stored data and big dimension management. In this paper, a method called Atrak is proposed, which uses a unified data format to make Mapper nodes independent to solve the data management problem mentioned earlier. The proposed method can be applied to star schema data warehouse models with distributive measures. Atrak increases query execution speed by employing node independence and the proper use of MapReduce. The proposed method was compared to established methods such as Hive, Spark-SQL, HadoopDB and Flink. Simulation results confirm improved query execution speed of the proposed method. Using data unification in MapReduce can be used in other fields, such as data mining and graph processing.
引用
收藏
页码:4596 / 4610
页数:15
相关论文
共 26 条
[1]  
[Anonymous], 2015, DATA ENG
[2]  
[Anonymous], 2012, Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, DOI DOI 10.1145/2213836.2213934
[3]  
[Anonymous], 2010, P 2 USENIX C HOT TOP
[4]  
[Anonymous], PROCEEDINGS
[5]  
[Anonymous], 2009, Proceedings of the VLDB Endowment
[6]  
Armbrust M., 2015, P 2015 ACM SIGMOD IN
[7]  
Barkhordari M., 2014, P 16 INT C EL COMM
[8]   Aras: A method with uniform distributed dataset to solve data warehouse problems for big data [J].
Barkhordari M. ;
Niamanesh M. .
International Journal of Distributed Systems and Technologies, 2017, 8 (02) :47-60
[9]  
Barkhordari M, 2017, J INF SCI ENG, V33, P143
[10]   ScaDiPaSi: An Effective Scalable and Distributable MapReduce-Based Method to Find Patient Similarity on Huge Healthcare Networks [J].
Barkhordari, Mohammadhossein ;
Niamanesh, Mahdi .
BIG DATA RESEARCH, 2015, 2 (01) :19-27