Aras: A method with uniform distributed dataset to solve data warehouse problems for big data

被引:3
作者
Barkhordari M. [1 ]
Niamanesh M. [1 ]
机构
[1] Information and Communication Technology Research Center, Advance Information System Research Group, Tehran
关键词
Big data; Data locality; Data warehouse; Mapreduce;
D O I
10.4018/IJDST.2017040104
中图分类号
学科分类号
摘要
Because of to the high rate of data growth and the need for data analysis, data warehouse management for big data is an important issue. Single node solutions cannot manage the large amount of information. Information must be distributed over multiple hardware nodes. Nevertheless, data distribution over nodes causes each node to need data from other nodes to execute a query. Data exchange among nodes creates problems, such as the joins between data segments that exist on different nodes, network congestion, and hardware node wait for data reception. In this paper, the Aras method is proposed. This method is a MapReduce-based method that introduces a data set on each mapper. By applying this method, each mapper node can execute its query independently and without need to exchange data with other nodes. Node independence solves the aforementioned data distribution problems. The proposed method has been compared with prominent data warehouses for big data, and the Aras query execution time was much lower than other methods. © 2017, IGI Global.
引用
收藏
页码:47 / 60
页数:13
相关论文
共 22 条
[1]  
Abouzeid A., Bajda-Pawlikowski K., Abadi D., Silberschatz A., Rasin A., HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proceedings of the VLDB Endowment, 2, 1, pp. 922-933, (2009)
[2]  
Armbrust M., Xin R.S., Lian C., Huai Y., Liu D., Bradley J.K., Zaharia M., Et al., Spark SQL: Relational data processing in, Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383-1394, (2015)
[3]  
Barkhordari M., Niamanesh M., ScadiBino: An effective MapReduce-based association rule mining method, Proceedings of the Sixteenth International Conference on Electronic Commerce, (2014)
[4]  
Barkhordari M., Niamanesh M., ScaDiPaSi: An effective scalable and distributable MapReduce-based method to find patient similarity on huge healthcare networks, Big Data Research, 2, 1, pp. 19-27, (2015)
[5]  
Barkhordari M., Niamanesh M., ScaDiGraph: A MapReduce-based method for solving graph problems, Journal of Information Science and Engineering, 33, 1, pp. 143-158, (2017)
[6]  
Chen S., Cheetah: A high performance, custom data warehouse on top of MapReduce, Proceedings of the VLDB Endowment, 3, 1-2, pp. 1459-1468, (2010)
[7]  
Dean J., Ghemawat S., MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51, 1, pp. 107-113, (2008)
[8]  
Dittrich J., Quiane-Ruiz J.A., Jindal A., Kargin Y., Setty V., Schad J., Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing), Proceedings of the VLDB Endowment, 3, 1-2, pp. 515-529, (2010)
[9]  
Eltabakh M.Y., Tian Y., Ozcan F., Gemulla R., Krettek A., McPherson J., CoHadoop: Flexible data placement and its exploitation in hadoop, Proceedings of the VLDB Endowment, 4, 9, pp. 575-585, (2011)
[10]  
Engle C., Lupher A., Xin R., Zaharia M., Franklin M.J., Shenker S., Stoica I., Shark: Fast data analysis using coarse-grained distributed memory, Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 689-692, (2012)