Hadoop Acceleration in an OpenFlow-based cluster

被引:27
作者
Narayan, Sandhya [1 ]
Bailey, Stu [1 ]
Daga, Anand [2 ]
机构
[1] InfoBlox Inc, Santa Clara, CA USA
[2] Univ Houston, Dept Comp Sci, Houston, TX USA
来源
2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC) | 2012年
关键词
Hadoop; BigData; OpenFlow; Software Defined Networks (SDN);
D O I
10.1109/SC.Companion.2012.76
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents details of our preliminary study of how Hadoop can control its network resources using OpenFlow in order to improve performance. Hadoop's distributed compute framework called MapReduce, exploits the distributed storage architecture of Hadoop's distributed file system HDFS to deliver scalable, reliable parallel processing services for arbitrary algorithms. The shuffle phase of Hadoop's MapReduce computation involves movement of intermediate data from Mappers to Reducers. Reducers are often delayed due to inadequate bandwidth between them and the Mappers, and thereby lower the performance of the cluster. OpenFlow is a popular example of software-defined network (SDN) technology. Our study explores the use of OpenFlow to provide better link bandwidth for shuffle traffic, and thereby decrease the time that Reducers have to wait to gather data from Mappers. Our experiments show decrease in execution time for a Hadoop job, when the shuffle traffic can use more of the available bandwidth on a link. Our approach illustrates how high performance computing applications can improve performance by controlling their underlying network resources. The work presented in this paper is a starting point for some experiments being done as part of SC12 SCinet Research Sandbox which will quantify the performance advantages of a version of Hadoop that uses OpenFlow to dynamically adjust the network topology of local and wide area Hadoop clusters.
引用
收藏
页码:535 / 538
页数:4
相关论文
共 6 条
[1]  
Bailey Stuart, SUPERCOMPUTING 2012
[2]  
ONF, SOFTW DEF NETW NEW N
[3]  
Rapp Jacob, 2011, HADOOP NETWORKS COMP
[4]  
Sur S., 2010, WORKSH MICR SUPP VIR
[5]  
Wang Yandong, P 2010 INT C SUP
[6]  
White Tom., 2011, HADOOP DEFINITIVE GU, V2