Optimal Server Selection for Straggler Mitigation

被引:17
作者
Badita, Ajay [1 ]
Parag, Parimal [1 ]
Aggarwal, Vaneet [2 ,3 ]
机构
[1] Indian Inst Sci, Dept Elect & Commun Engn, Bengaluru 560012, India
[2] Purdue Univ, Sch Ind Engn, W Lafayette, IN 47907 USA
[3] Purdue Univ, Sch Elect & Comp Engn, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
Servers; Task analysis; Job shop scheduling; Redundancy; Processor scheduling; IEEE transactions; Straggler mitigation; distributed computing; shifted exponential distribution; completion time; scheduling; forking points; REDUNDANT REQUESTS;
D O I
10.1109/TNET.2020.2973224
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of large-scale distributed compute systems is adversely impacted by stragglers when the execution time of a job is uncertain. To manage stragglers, we consider a multi-fork approach for job scheduling, where additional parallel servers are added at forking instants. In terms of the forking instants and the number of additional servers, we compute the job completion time and the cost of server utilization when the task processing times are assumed to have a shifted exponential distribution. We use this study to provide insights into the scheduling design of the forking instants and the associated number of additional servers to be started. Numerical results demonstrate orders of magnitude improvement in cost in the regime of low completion times as compared to the prior works.
引用
收藏
页码:709 / 721
页数:13
相关论文
共 31 条
[1]  
Aktas M. F., 2018, ACM SIGMETRICS Performance Evaluation Review, V45, P224
[2]   TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems [J].
Al-Abbasi, Abubakr O. ;
Aggarwal, Vaneet ;
Lan, Tian .
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2019, 16 (04) :1609-1623
[3]   Video Streaming in Distributed Erasure-Coded Storage Systems: Stall Duration Analysis [J].
Al-Abbasi, Abubakr O. ;
Aggarwal, Vaneet .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2018, 26 (04) :1921-1932
[4]  
Ananthanarayanan G., 2013, PROF 10 USENIX S NET, P185
[5]  
[Anonymous], [No title captured]
[6]  
[Anonymous], [No title captured]
[7]  
[Anonymous], [No title captured]
[8]   Latency Analysis for Distributed Coded Storage Systems [J].
Badita, Ajay ;
Parag, Parimal ;
Chamberland, Jean-Francois .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (08) :4683-4698
[9]   A CLASS OF EXACT-SOLUTIONS FOR RICHARDS EQUATION [J].
BARRY, DA ;
PARLANGE, JY ;
SANDER, GC ;
SIVAPALAN, M .
JOURNAL OF HYDROLOGY, 1993, 142 (1-4) :29-46
[10]   Analytical approximations for real values of the Lambert W-function [J].
Barry, DA ;
Parlange, JY ;
Li, L ;
Prommer, H ;
Cunningham, CJ ;
Stagnitti, E .
MATHEMATICS AND COMPUTERS IN SIMULATION, 2000, 53 (1-2) :95-103