On the quality of service of failure detectors

被引:139
|
作者
Chen, W
Toueg, S
Aguilera, MK
机构
[1] Oracle Corp, Nashua, NH 03062 USA
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3H5, Canada
[3] Compaq Syst Res Ctr, Palo Alto, CA 94301 USA
关键词
failure detectors; quality of service; fault tolerance; distributed algorithm; probabilistic analysis;
D O I
10.1109/TC.2002.1004595
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study the quality of service (QoS) of failure detectors. By QoS, we mean a specification that quantifies 1) how fast the failure detector detects actual failures and 2) how well it avoids false detections. We first propose a set of QoS metrics to specify failure detectors for systems with probabilistic behaviors, i.e., for systems where message delays and message losses follow some probability distributions. We then give a new failure detector algorithm and analyze its QoS in terms of the proposed metrics. We show that, among a large class of failure detectors, the new algorithm is optimal with respect to some of these QoS metrics. Given a set of failure detector QoS requirements, we show how to compute the parameters of our algorithm so that it satisfies these requirements and we show how this can be done even if the probabilistic behavior of the system is not known. We then present some simulation results that show that the new failure detector algorithm provides a better QoS than an algorithm that is commonly used in practice. Finally, we suggest some ways to make our failure detector adaptive to changes in the probabilistic behavior of the network.
引用
收藏
页码:561 / 580
页数:20
相关论文
共 50 条
  • [1] On the quality of service of failure detectors
    Chen, W
    Toueg, S
    Aguilera, MK
    IEEE TRANSACTIONS ON COMPUTERS, 2002, 51 (01) : 13 - 32
  • [2] On the Quality of Service of Crash-Recovery Failure Detectors
    Ma, Tiejun
    Hillston, Jane
    Anderson, Stuart
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2010, 7 (03) : 271 - 283
  • [3] On the quality of service of crash-recovery failure detectors
    Ma, Tiejun
    Hillston, Jane
    Anderson, Stuart
    37TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2007, : 739 - +
  • [4] Comparative Analysis of Quality of Service and Memory Usage for Adaptive Failure Detectors in Healthcare Systems
    Xiong, Naixue
    Vasilakos, Athanasios V.
    Yang, Laurence T.
    Song, Lingyang
    Pan, Yi
    Kannan, Rajgopal
    Li, Yingshu
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2009, 27 (04) : 495 - 509
  • [5] Failure detectors encapsulate fairness
    Pike, Scott M.
    Sastry, Srikanth
    Welch, Jennifer L.
    DISTRIBUTED COMPUTING, 2012, 25 (04) : 313 - 333
  • [6] Failure detectors encapsulate fairness
    Scott M. Pike
    Srikanth Sastry
    Jennifer L. Welch
    Distributed Computing, 2012, 25 : 313 - 333
  • [7] On the implementation of communication-optimal failure detectors
    Larrea, Mikel
    Lafuente, Alberto
    Soraluze, Iratxe
    Cortinas, Roberto
    Wieland, Joachim
    DEPENDABLE COMPUTING, PROCEEDINGS, 2007, 4746 : 25 - +
  • [8] Implementing unreliable failure detectors with unknown membership
    Jimenez, Ernesto
    Arevalo, Sergio
    Fernandez, Antonio
    INFORMATION PROCESSING LETTERS, 2006, 100 (02) : 60 - 63
  • [9] Mutual exclusion in asynchronous systems with failure detectors
    Delporte-Gallet, C
    Fauconnier, H
    Guerraoui, R
    Kouznetsov, P
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2005, 65 (04) : 492 - 505
  • [10] Solvability-Based Comparison of Failure Detectors
    Sastry, Srikanth
    Widder, Josef
    2014 IEEE 13TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS (NCA 2014), 2014, : 269 - 276