Unreliable failure detectors for reliable distributed systems

被引:1303
|
作者
Chandra, TD [1 ]
Toueg, S [1 ]
机构
[1] CORNELL UNIV,DEPT COMP SCI,ITHACA,NY 14853
关键词
agreement problem; asynchronous systems; atomic broadcast; Byzantine Generals' problem; commit problem; consensus problem; crash failures; failure detection; fault-tolerance; message passing; partial synchrony; processor failures;
D O I
10.1145/226643.226647
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We. introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties-completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus, the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
引用
收藏
页码:225 / 267
页数:43
相关论文
共 50 条
  • [31] RELIABLE OR UNRELIABLE ICS
    REINER, WG
    ELECTRONICS, 1967, 40 (03): : 7 - &
  • [32] Token-based atomic broadcast using unreliable failure detectors
    Ekwall, R
    Schiper, A
    Urbán, P
    23RD IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2004, : 52 - 65
  • [33] Reliable synchronization in distributed systems
    Roosta, SH
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2004, 81 (06) : 661 - 673
  • [34] An Autonomic Hierarchical Reliable Broadcast Protocol for Asynchronous Distributed Systems with Failure Detector
    Jeanneau, Elise
    Rodrigues, Luiz A.
    Arantes, Luciana
    Duarte, Elias P., Jr.
    2016 SEVENTH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING (LADC), 2016, : 91 - 98
  • [36] Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems
    Hoefler, Torsten
    Barak, Amnon
    Shiloh, Amnon
    Drezner, Zvi
    2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2017, : 357 - 366
  • [37] RELIABLE COMPUTATION IN COMPUTING SYSTEMS DESIGNED FROM UNRELIABLE COMPONENTS
    TAYLOR, MG
    BELL SYSTEM TECHNICAL JOURNAL, 1968, 47 (10): : 2339 - +
  • [38] Organizing unreliable decision makers into reliable decision support systems
    Camponogara, E
    Talukdar, SN
    INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2003, 10 (02): : 136 - 146
  • [39] Asynchronous Communication under Reliable and Unreliable Network Topologies in Distributed Multiagent Systems: A Robust Technique for Computing Average Consensus
    Mustafa, Ali
    ul Islam, Muhammad Najam
    Ahmed, Salman
    Tufail, Muhammad Ahsan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018