Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud

被引:12
作者
Singh R. [1 ]
Kaur P.J. [1 ]
机构
[1] Department of I.T, U.I.E.T, Panjab University, Chandigarh
关键词
Apache Hive; Apache Pig; Apache Tez; Big Data; Hadoop; HDFS; MapReduce;
D O I
10.1186/s40537-016-0051-6
中图分类号
学科分类号
摘要
Big Data is the term used for larger data sets that are very complex and not easily processed by the traditional devices. Today is the need of the new technology for processing these large data sets. Apache Hadoop is the good option and it has many components that worked together to make the hadoop ecosystem robust and efficient. Apache Pig is the core component of hadoop ecosystem and it accepts the tasks in the form of scripts. To run these scripts Apache Pig may use MapReduce or Apache Tez framework. In our previous paper we analyze how these two frameworks different from each other on the basis of some parameters chosen. We compare both the frameworks in theoretical and empirical way on the single node cluster. Here, in this paper we try to perform the analysis on multinode cluster which is installed at Amazon cloud. © 2016, The Author(s).
引用
收藏
相关论文
共 10 条
[1]  
Ouaknine K., Carey M., Kirkpatrick S., The Pig mix benchmark on Pig, MapReduce, and HPCC systems, In: 2015 IEEE international congress on Big Data (BigData Congress), 2015, pp. 643-648
[2]  
Bansal S.K., Towards a semantic extract-transform-load (ETL) framework for Big Data integration, In: 2014 IEEE international congress on Big Data (BigData Congress), 2014, pp. 522-529
[3]  
Maitrey S., Jha C.K., Handling Big Data efficiently by using MapReduce technique, In: IEEE international conference on computational intelligence & communication technology (CICT), 2015, pp. 703-708
[4]  
Ravindra P., Towards optimization of RDF analytical queries on MapReduce, In: IEEE 30th international conference on data engineering workshops (ICDEW), 2014, pp. 335-339
[5]  
Fuad A., Erwin A., Ipung H.P., Processing performance on Apache Pig, Apache Hive and MySQL cluster, In: 2014 international conference on information, communication technology and system (ICTS), 2014, pp. 297-302
[6]  
Azzedin F., Towards a scalable HDFS architecture, In: 2013 international conference on collaboration technologies and systems (CTS), 2013, pp. 155-161
[7]  
Gates A.F., Dai J., Nair T., Apache Pig’s optimizer, IEEE Data Eng Bull, 36, 1, (2013)
[8]  
Gates A.F., Natkovich O., Chopra S., Kamath P., Narayanamurthy S.M., Olston C., Reed B., Srinivasan S., Srivastava U., Building a high-level dataflow system on top of Map-Reduce: the Pig experience, Proc VLDB Endow, 2, 2, pp. 1414-1425, (2009)
[9]  
Thusoo A., Sarma J.S., Jain N., Shao Z., Chakka P., Zhang N., Antony S., Liu H., Murthy R., Hive-a petabyte scale data warehouse using hadoop, 2010 IEEE 26th international conference on data engineering (ICDE), pp. 996-1005, (2010)
[10]  
Thusoo A., Sarma J.S., Jain N., Shao Z., Chakka P., Anthony S., Liu H., Wyckoff P., Murthy R., Hive: a warehousing solution over a map-reduce framework, Proc VLDB Endow, 2, 2, pp. 1626-1629, (2009)