Tackling Latency via Replication in Distributed Systems

被引:5
作者
Qiu, Zhan [1 ]
Perez, Juan F. [2 ]
Harrison, Peter G. [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Melbourne, Sch Math & Stat, Melbourne, Vic, Australia
来源
PROCEEDINGS OF THE 2016 ACM/SPEC INTERNATIONAL CONFERENCE ON PERFORMANCE ENGINEERING (ICPE'16) | 2016年
基金
英国工程与自然科学研究理事会;
关键词
Latency-tolerance; Fault-tolerance; Matrix-analytic methods; Response time distribution; Distributed system; TIME;
D O I
10.1145/2851553.2851562
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Consistently high reliability and low latency are twin requirements common to many forms of distributed processing; for example, server farms and mirrored storage access. To address them, we consider replication of requests with canceling - i.e. initiate multiple concurrent replicas of a request and use the first successful result returned, canceling all outstanding replicas. This scheme has been studied recently, but mostly for systems with a single central queue, while server farms exploit distributed resources for scalability and robustness. We develop an approximate stochastic model to determine the response-time distribution in a system with distributed queues, and compare its performance against its centralized counterpart. Validation against simulation indicates that our model is accurate for not only the mean response time but also its percentiles, which are particularly relevant for deadline-driven applications. Further, we show that in the distributed set-up, replication with canceling has the potential to reduce response times, even at relatively high utilization. We also find that it offers response times close to those of the centralized system, especially at medium-to-high request reliability. These findings support the use of replication with canceling as an effective mechanism for both fault-and delay-tolerance.
引用
收藏
页码:197 / 208
页数:12
相关论文
共 23 条
[1]  
Ananthanarayanan G., 2012, MEMORY, V40, P80
[2]   Calculation of the steady state waiting time distribution in GI/PH/c and MAP/PH/c queues [J].
Asmussen, S ;
Moller, JR .
QUEUEING SYSTEMS, 2001, 37 (1-3) :9-29
[3]   The Tail at Scale [J].
Dean, Jeffrey ;
Barroso, Luiz Andre .
COMMUNICATIONS OF THE ACM, 2013, 56 (02) :74-80
[4]  
Gandhi A, 2009, PERF E R SI, V37, P157
[5]   The Cost of a Cloud: Research Problems in Data Center Networks [J].
Greenberg, Albert ;
Hamilton, James ;
Maltz, David A. ;
Patel, Parveen .
ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2009, 39 (01) :68-73
[6]  
Guenter B, 2011, IEEE INFOCOM SER, P1332, DOI 10.1109/INFCOM.2011.5934917
[7]  
Harrison P. G., 2013, EPEW
[8]   ANALYSIS OF A CONTINUOUS TIME SM[K]/PH[K]/1/FCFS QUEUE: AGE PROCESS, SOJOURN TIMES, AND QUEUE LENGTHS [J].
He, Qiming .
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2012, 25 (01) :133-155
[9]  
Heindl A., 2007, EPEW
[10]  
Joshi G., 2015, ARXIV150803599