An Investigation of Hadoop Parameters in SDN-enabled Clusters

被引:0
作者
Tariq, Hassan [1 ]
Welch, Ian [1 ]
Al-Sahaf, Harith [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
来源
2018 12TH INTERNATIONAL CONFERENCE ON MATHEMATICS, ACTUARIAL SCIENCE, COMPUTER SCIENCE AND STATISTICS (MACS) | 2018年
关键词
distributed computing; big data; software defined networking; optimization;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Apache Hadoop is an open source framework for distributed and parallel processing of big data jobs. It has its own distributed file system which facilitates local storage and processing on commodity hardware. Hadoop distributed file system is a core part of the Hadoop ecosystem which comprises of large number of configuration parameters. Customizing these parameters to enhance the throughput of the system, for a particular job, may require a lot of experience and skills. During the execution of a Hadoop job in a multi-node cluster, the communication among nodes takes place through switch. These switches have vendor-specific protocols to direct the flow of traffic. Software defined networking has made it possible to make networks more programmable and configurable. In this paper, we analysed the impact of Hadoop distributed file system parameters, like block size, replication factor, MapReduce parameter like number of mapper, and Hive query structure. We used faucet, an OpenFlow switch, to monitor the transfer of both packets in/out of the system to see whether network traffic information can be used to predict the impact of Hadoop parameters. We have also monitored CPU usage, disk usage, memory usage and the overall execution time during the execution of Hadoop jobs. Our investigation showed that customizing these configuration parameters of Hadoop does have an impact on network, system and execution time.
引用
收藏
页数:9
相关论文
共 23 条
[1]  
[Anonymous], 2014, SMART PLAN BIG DAT A
[2]  
[Anonymous], 2015 INT C SOFT COMP
[3]  
[Anonymous], 2015 39 NAT SYST C N
[4]  
Bailey J., 2016, QUEUE, V14, P54
[5]  
Nascimento JPB, 2017, 2017 COMPUTING CONFERENCE, P1069, DOI 10.1109/SAI.2017.8252224
[6]   Performance Evaluation of Read and Write Operations in Hadoop Distributed File System [J].
Krishna, T. Lakshmi Siva Rama ;
Ragunathan, T. ;
Battula, Sudheer Kumar .
2014 SIXTH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP), 2014, :110-113
[7]  
Liu CT, 2016, INT CONF ADV COMMUN, P474, DOI 10.1109/ICACT.2016.7423438
[8]   Methodological foundations: Enabling the next generation of security [J].
Maxion, RA ;
Roberts, RRM .
IEEE SECURITY & PRIVACY, 2005, 3 (02) :54-57
[9]  
Merla P, 2017, IEEE INT CONF BIG DA, P4783, DOI 10.1109/BigData.2017.8258541
[10]   Data-Intensive Workload Consolidation for the Hadoop Distributed File System [J].
Moraveji, Reza ;
Taheri, Javid ;
Reza, Mohammad ;
Rizvandi, Nikzad Babaii ;
Zomaya, Albert Y. .
2012 ACM/IEEE 13TH INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), 2012, :95-103