On the quality of service of failure detectors

被引:139
|
作者
Chen, W
Toueg, S
Aguilera, MK
机构
[1] Oracle Corp, Nashua, NH 03062 USA
[2] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3H5, Canada
[3] Compaq Syst Res Ctr, Palo Alto, CA 94301 USA
关键词
failure detectors; quality of service; fault tolerance; distributed algorithm; probabilistic analysis;
D O I
10.1109/TC.2002.1004595
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study the quality of service (QoS) of failure detectors. By QoS, we mean a specification that quantifies 1) how fast the failure detector detects actual failures and 2) how well it avoids false detections. We first propose a set of QoS metrics to specify failure detectors for systems with probabilistic behaviors, i.e., for systems where message delays and message losses follow some probability distributions. We then give a new failure detector algorithm and analyze its QoS in terms of the proposed metrics. We show that, among a large class of failure detectors, the new algorithm is optimal with respect to some of these QoS metrics. Given a set of failure detector QoS requirements, we show how to compute the parameters of our algorithm so that it satisfies these requirements and we show how this can be done even if the probabilistic behavior of the system is not known. We then present some simulation results that show that the new failure detector algorithm provides a better QoS than an algorithm that is commonly used in practice. Finally, we suggest some ways to make our failure detector adaptive to changes in the probabilistic behavior of the network.
引用
收藏
页码:561 / 580
页数:20
相关论文
共 50 条
  • [21] RQNoC: A Resilient Quality-of-Service Network-on-Chip with Service Redirection
    Malek, Alirad
    Sourdis, Ioannis
    Tzilis, Stavros
    He, Yifan
    Rauwerda, Gerard
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2016, 15 (02)
  • [22] Unreliable failure detectors for reliable distributed systems
    Chandra, TD
    Toueg, S
    JOURNAL OF THE ACM, 1996, 43 (02) : 225 - 267
  • [23] Evaluation of failure detectors based a cost metric
    Yu, XZ
    Yun, XC
    Wang, SP
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2655 - 2662
  • [24] Anonymous asynchronous systems: the case of failure detectors
    François Bonnet
    Michel Raynal
    Distributed Computing, 2013, 26 : 141 - 158
  • [25] A CASE STUDY ON PARAMETRIC VERIFICATION OF FAILURE DETECTORS
    Tran, Thanh-hai
    Konnov, Igor
    Widder, Josef
    LOGICAL METHODS IN COMPUTER SCIENCE, 2022, 19 (01) : 17:1 - 17:52
  • [26] A Case Study on Parametric Verification of Failure Detectors
    Tran, Thanh-Hai
    Konnov, Igor
    Widder, Josef
    FORMAL TECHNIQUES FOR DISTRIBUTED OBJECTS, COMPONENTS, AND SYSTEMS, FORTE 2021, 2021, 12719 : 138 - 156
  • [27] Anonymous asynchronous systems: the case of failure detectors
    Bonnet, Francois
    Raynal, Michel
    DISTRIBUTED COMPUTING, 2013, 26 (03) : 141 - 158
  • [28] Quorum-based mutual exclusion in asynchronous distributed systems with unreliable failure detectors
    Sung-Hoon Park
    Seon-Hyong Lee
    The Journal of Supercomputing, 2014, 67 : 469 - 484
  • [29] Quorum-based mutual exclusion in asynchronous distributed systems with unreliable failure detectors
    Park, Sung-Hoon
    Lee, Seon-Hyong
    JOURNAL OF SUPERCOMPUTING, 2014, 67 (02): : 469 - 484
  • [30] A methodology to design arbitrary failure detectors for distributed protocols
    Baldoni, Roberto
    Helary, Jean-Michel
    Piergiovanni, Sara Tucci
    JOURNAL OF SYSTEMS ARCHITECTURE, 2008, 54 (07) : 619 - 637