Resource Scheduling and Data Locality for Virtualized Hadoop on IaaS Cloud Platform

被引:3
作者
Tao, Dan [1 ]
Wang, Bingxu [1 ]
Lin, Zhaowen [2 ,3 ,4 ]
Wu, Tin-Yu [5 ]
机构
[1] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing 100044, Peoples R China
[2] Beijing Univ Posts & Telecommun, Network & Informat Ctr, Inst Network Technol, Beijing 100876, Peoples R China
[3] Beijing Univ Posts & Telecommun, Sci & Technol Informat Transmiss & Disseminat Com, Beijing 100876, Peoples R China
[4] Beijing Univ Posts & Telecommun, Natl Engn Lab Mobile Network Secur 2013 2685, Beijing 100876, Peoples R China
[5] Natl Ilan Univ, Dept Comp Sci & Informat Engn, Yilan 26041, Taiwan
来源
BIG DATA COMPUTING AND COMMUNICATIONS, (BIGCOM 2016) | 2016年 / 9784卷
关键词
Hadoop; Resource scheduling; Data locality; IaaS; MAPREDUCE;
D O I
10.1007/978-3-319-42553-5_28
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With cloud computing technology becoming more mature, it is urgent to combine big data processing tool Hadoop with IaaS cloud platform. In this paper, we firstly propose a new Dynamic Hadoop Cluster on IaaS (DHCI) architecture, which includes four key modules: monitoring module, scheduling module, virtual machine management module and virtual machine migration module. The load of both physical hosts and virtual machines are collected by the monitoring module, and can be used for designing resource scheduling and data locality solutions. Secondly, we present a load feedback based resource scheduling scheme. The resource allocation can be avoided on overburdened physical hosts or the strong scalability of virtualized cluster can be achieved by fluctuating the amount of virtual machines (VMs). Thirdly, we reuse the method of VM migration and propose a dynamic migration based data locality scheme. We migrate computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. We evaluate our solutions in a realistic scenario based on Openstack. Massive experimental results demonstrate the effectiveness of our solutions that contribute to balance workload and performance improvement, even under heavy-loaded cloud system conditions.
引用
收藏
页码:332 / 341
页数:10
相关论文
共 13 条
[1]  
[Anonymous], J COMPUT RES DEV S
[2]  
[Anonymous], 2008, 8 USENIX S OP SYST D
[3]  
[Anonymous], P 9 INT C AUT COMP
[4]   A View of Cloud Computing [J].
Armbrust, Michael ;
Fox, Armando ;
Griffith, Rean ;
Joseph, Anthony D. ;
Katz, Randy ;
Konwinski, Andy ;
Lee, Gunho ;
Patterson, David ;
Rabkin, Ariel ;
Stoica, Ion ;
Zaharia, Matei .
COMMUNICATIONS OF THE ACM, 2010, 53 (04) :50-58
[5]  
Bu X., 2013, P 22 INT S HIGH PERF, P227
[6]  
Corradi A, 2015, IEEE INT CONF COMM, P1914, DOI 10.1109/ICCW.2015.7247460
[7]   ADAPT: Availability-aware MapReduce Data Placement for Non-Dedicated Distributed Computing [J].
Jin, Hui ;
Yang, Xi ;
Sun, Xian-He ;
Raicu, Ioan .
2012 IEEE 32ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2012, :516-525
[8]  
Kang H, 2011, HPDC 11: PROCEEDINGS OF THE 20TH INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, P251
[9]  
Sharma B., 2012, IEEE 5 INT C CLOUD C, P1
[10]  
Thaha AF, 2014, 2014 4TH WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES (WICT), P296, DOI 10.1109/WICT.2014.7077282