Unreliable failure detectors for reliable distributed systems

被引:1303
|
作者
Chandra, TD [1 ]
Toueg, S [1 ]
机构
[1] CORNELL UNIV,DEPT COMP SCI,ITHACA,NY 14853
关键词
agreement problem; asynchronous systems; atomic broadcast; Byzantine Generals' problem; commit problem; consensus problem; crash failures; failure detection; fault-tolerance; message passing; partial synchrony; processor failures;
D O I
10.1145/226643.226647
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We. introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties-completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
引用
收藏
页码:225 / 267
页数:43
相关论文
共 50 条
  • [1] Unreliable failure detectors for reliable distributed systems
    I.B.M. Thomas J. Watson Research, Cent, Hawthorne, United States
    J Assoc Comput Mach, 2 (225-267):
  • [2] Leader election in asynchronous distributed systems with unreliable failure detectors
    Park, SH
    Yamashita, M
    PDPTA '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS 1-3, 2004, : 687 - 693
  • [3] Quorum-based mutual exclusion in asynchronous distributed systems with unreliable failure detectors
    Sung-Hoon Park
    Seon-Hyong Lee
    The Journal of Supercomputing, 2014, 67 : 469 - 484
  • [4] Quorum-based mutual exclusion in asynchronous distributed systems with unreliable failure detectors
    Park, Sung-Hoon
    Lee, Seon-Hyong
    JOURNAL OF SUPERCOMPUTING, 2014, 67 (02): : 469 - 484
  • [5] Non-Blocking Atomic Commitment Algorithm in Asynchronous Distributed Systems with Unreliable Failure Detectors
    Park, Sung-Hoon
    Lee, Jea-Yep
    Yu, Su-Chang
    PROCEEDINGS OF THE 2013 10TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2013, : 33 - 38
  • [6] On the implementation of unreliable failure detectors in partially synchronous systems
    Larrea, M
    Fernández, A
    Arévalo, S
    IEEE TRANSACTIONS ON COMPUTERS, 2004, 53 (07) : 815 - 828
  • [7] Solving Non-Blocking Atomic Commitment Problem in Asynchronous Distributed Systems with Unreliable Failure Detectors
    Park, Sung-Hoon
    Lee, Seon-Hyong
    CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, 2012, 310 : 94 - 102
  • [8] Quorum Based Mutual Exclusion in Asynchronous Systems with Unreliable Failure Detectors
    Park, Sung-Hoon
    Lee, Seon-Hyong
    GRID AND DISTRIBUTED COMPUTING, 2011, 261 : 25 - 34
  • [9] Efficient algorithms to implement unreliable failure detectors in partially synchronous systems
    Larrea, M
    Arevalo, S
    Fernández, A
    DISTRIBUTED COMPUTING, 1999, 1693 : 34 - 48
  • [10] Reliable systems on unreliable fabrics
    Austin, Todd
    Bertacco, Valeria
    Mahke, Scott
    Cao, Yu
    IEEE DESIGN & TEST OF COMPUTERS, 2008, 25 (04): : 322 - 332