Resource-Constrained Replication Strategies for Hierarchical and Heterogeneous Tasks

被引:12
作者
Ao, Weng Chon [1 ]
Psounis, Konstantinos [1 ]
机构
[1] Univ Southern Calif, Ming Hsieh Dept Elect & Comp Engn, Los Angeles, CA 90089 USA
关键词
Task analysis; Resource management; Optimization; Cloud computing; Power system reliability; Probability; Time factors; Task replication; latency; resource allocation; task graph; parallel computing; TRADE-OFF;
D O I
10.1109/TPDS.2019.2945294
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In large-scale cloud computing systems, a task is often divided into multiple subtasks which can be executed in parallel in different machines. As a result, the task completion time is constrained by the completion time of the slowest subtask. To reduce the task completion time, the strategy of replicating the straggling subtasks has been employed in cloud computing frameworks such as MapReduce and Hadoop. Analyzing mathematically the performance of such replication strategies has recently received great attention. However, most of the analytical work focuses on the case where the completion times of the subtasks are identically distributed. This assumption may not hold in practice due to the modularization and encapsulation of the computation of a task, resulting in different service requirements for different subtasks. In this paper, we consider the case where the completion times of the subtasks of a task are drawn from heterogeneous/empirical distributions. Furthermore, we consider the case where jobs consist of hierarchical tasks that are required to be executed in a specific order described by a task precedence graph. We propose a novel framework to investigate how to allocate replication resources among the subtasks such that the overall task completion time is minimized. Specifically, we devise a Lagrange multiplier-based method and a water-filling-like algorithm for integer programs. We show via analysis and simulations the optimality and efficiency of our proposed algorithms, and explore the tradeoff between cost and latency from introducing replications in a task graph.
引用
收藏
页码:793 / 804
页数:12
相关论文
共 25 条
[1]  
Ananthanarayanan Ganesh, 2013, Proceedings of NSDI '13: 10th USENIX Symposium on Networked Systems Design and Implementation. NSDI '13, P185
[2]  
[Anonymous], 2014, Convex Optimiza- tion
[3]  
[Anonymous], P IEEE INFOCOM
[4]  
[Anonymous], 2011, Advances in Neural Information Processing Systems
[5]  
Arnold B.C., 2008, CLASS APPL MATH, V54
[6]   Irnproving scheduling of tasks in a heterogeneous environment [J].
Bajaj, R ;
Agrawal, DP .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2004, 15 (02) :107-118
[7]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[8]  
Charras-Garrido M, 2013, J SFDS, V154, P66
[9]  
Da Wang, 2015, ACM SIGMETRICS Performance Evaluation Review, V43, P7, DOI 10.1145/2847220.2847223
[10]  
Da Wang, 2014, ACM SIGMETRICS Performance Evaluation Review, V42, P599, DOI 10.1145/2591971.2592042