Skew-Aware Task Scheduling in Clouds

被引:1
作者
Li, Dongsheng [1 ]
Chen, Yixing [1 ]
Hai, Richard Hu [2 ]
机构
[1] Natl Univ Def Technol, Sch Comp, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China
[2] Raffles Business Inst, Singapore, Singapore
来源
2013 IEEE SEVENTH INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2013) | 2013年
基金
中国国家自然科学基金;
关键词
Data Skew; Task Scheduling; Cloud; Load balancing;
D O I
10.1109/SOSE.2013.64
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data skew is an important reason for the emergence of stragglers in MapReduce-like cloud systems. In this paper, we propose a Skew-Aware Task Scheduling (SATS) mechanism for iterative applications in MapReduce-like systems. The mechanism utilizes the similarity of data distribution in adjacent iterations of iterative applications to reduce the straggle problem caused by data skew. It collects the data distribution information during the execution of tasks for the current iteration, and uses the information to guide data partitioning in tasks for the next iteration. We implement the mechanism in the HaLoop system and deploy it in a cluster. Experiments show that the proposed mechanism could deal with the data skew and improve the load balancing effectively.
引用
收藏
页码:341 / 346
页数:6
相关论文
共 16 条
[1]  
Ananthanarayanan Ganesh, 2010, P OSDI 10
[2]  
Brin S., 1998, P WWW 98
[3]  
Bu Yingyi, 2010, P VLDB 10
[4]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113
[5]  
Dittrich J., 2010, P VLDB ENDOWMENT, V3
[6]  
Ghemawat S., 2003, P SOSP 03
[7]  
Gonzalez T., 1977, SIAM Journal on Computing, V6, P155, DOI 10.1137/0206013
[8]  
Gufler B., 2012, P ICDE 12
[9]  
Ibrahim S., 2010, P CLOUDCOM
[10]  
Kwon YongChul, 2010, P ACM S CLOUD COMP