Optimization for Speculative Execution in a MapReduce-like Cluster

被引:0
作者
Xu, Huanle [1 ]
Lau, Wing Cheong [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Hong Kong, Hong Kong, Peoples R China
来源
2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM) | 2015年
关键词
Job scheduling; speculative execution; cloning; straggler detection; optimization;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A parallel processing job can be delayed substantially as long as one of its many tasks is being assigned to an unreliable machine. To tackle this so-called straggler problem, most parallel processing frameworks such as MapReduce have adopted various strategies under which the system may speculatively launch additional copies of the same task if its progress is abnormally slow or simply because extra idling resource is available. In this paper, we focus on the design of speculative execution schemes for a parallel processing cluster under different loading conditions. For the lightly loaded case, we analyze and propose two optimization-based schemes, namely, the Smart Cloning Algorithm (SCA) which is based on maximizing the job utility. We also derive the workload threshold under which SCA should be used for speculative execution. Our simulation results show SCA can reduce the total job flowtime by nearly 22% comparing to the speculative execution strategy of Microsoft Mantri. For the heavily loaded case, we propose the Enhanced Speculative Execution (ESE) algorithm which is an extension of the Microsoft Mantri scheme. We show that the ESE algorithm can beat the Mantri baseline scheme by 35% in terms of job flowtime while consuming the same amount of resource.
引用
收藏
页数:9
相关论文
共 20 条
[1]  
Ananthanarayanan G., 2013, PROC USENIX NSDI
[2]  
Ananthanarayanan G., 2010, USENIX OSDI
[3]  
Ananthanarayanan Ganesh., 2014, USENIX NSDI
[4]  
[Anonymous], 2011, CISC VIS NETW IND GL
[5]  
[Anonymous], 18 INT C PAR DISTR S
[6]   On Cost-Aware Monitoring for Self-Adaptive Load Sharing [J].
Breitgand, David ;
Cohen, Rami ;
Nahir, Amir ;
Raz, Danny .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2010, 28 (01) :70-83
[7]  
Chang H, 2011, IEEE INFOCOM SER, P3074, DOI 10.1109/INFCOM.2011.5935152
[8]  
Chen F., 2012, P IEEE INFOCOM
[9]  
Chen Q., 2013, IEEE T COMPUTERS
[10]  
Cooper R. B., 1972, INTRO QUEUEING THEOR