On the quality of service of failure detectors

被引:139
|
作者
Chen, W
Toueg, S
Aguilera, MK
机构
[1] Oracle Corp, Nashua, NH 03062 USA
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3H5, Canada
[3] Compaq Syst Res Ctr, Palo Alto, CA 94301 USA
关键词
failure detectors; quality of service; fault tolerance; distributed algorithm; probabilistic analysis;
D O I
10.1109/TC.2002.1004595
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study the quality of service (QoS) of failure detectors. By QoS, we mean a specification that quantifies 1) how fast the failure detector detects actual failures and 2) how well it avoids false detections. We first propose a set of QoS metrics to specify failure detectors for systems with probabilistic behaviors, i.e., for systems where message delays and message losses follow some probability distributions. We then give a new failure detector algorithm and analyze its QoS in terms of the proposed metrics. We show that, among a large class of failure detectors, the new algorithm is optimal with respect to some of these QoS metrics. Given a set of failure detector QoS requirements, we show how to compute the parameters of our algorithm so that it satisfies these requirements and we show how this can be done even if the probabilistic behavior of the system is not known. We then present some simulation results that show that the new failure detector algorithm provides a better QoS than an algorithm that is commonly used in practice. Finally, we suggest some ways to make our failure detector adaptive to changes in the probabilistic behavior of the network.
引用
收藏
页码:561 / 580
页数:20
相关论文
共 50 条
  • [31] Quality of service in the internet
    Prashant Bharadwaj
    Resonance, 2005, 10 (3) : 57 - 70
  • [32] A quality of service interconnect
    de Wit, M
    Mackenzie, L
    Ould-Khaoua, M
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V, 2000, : 985 - 990
  • [33] Quality of Service in the Internet
    Bharadwaj, Prashant
    RESONANCE-JOURNAL OF SCIENCE EDUCATION, 2005, 10 (03): : 57 - 70
  • [34] Probabilistic and temporal failure detectors for solving distributed problems
    Guerraoui, Rachid
    Kozhaya, David
    Pignolet, Yvonne-Anne
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 158 : 1 - 15
  • [35] Improving the Robustness of Distributed Failure Detectors in Adverse Conditions
    Lemos, F. T. C.
    Sato, L. M.
    IEEE LATIN AMERICA TRANSACTIONS, 2012, 10 (01) : 1364 - 1369
  • [36] Eventually perfect failure detectors using ADD channels
    Sastry, Srikanth
    Pike, Scott M.
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 483 - 496
  • [37] Failure detectors for large-scale distributed systems
    Hayashibara, N
    Cherif, A
    Katayama, T
    21ST IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2002, : 404 - 409
  • [38] Quality of Service management for Web service compositions
    Guimaraes Garcia, Diego Zuquim
    Felgar de Toledo, Maria Beatriz
    CSE 2008:11TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 189 - 196
  • [39] A Self-tuning Failure Detection Scheme for Cloud Computing Service
    Xiong, Naixue
    Vasilakos, Athanasios V.
    Wu, Jie
    Yang, Y. Richard
    Rindos, Andy
    Zhou, Yuezhi
    Song, Wen-Zhan
    Pan, Yi
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 668 - 679
  • [40] Quality of Service, Quality of Experience and Online Learning
    Kist, Alexander A.
    Brodie, Lyn
    2012 FRONTIERS IN EDUCATION CONFERENCE (FIE), 2012,