A distributed fault-tolerant asynchronous algorithm for performing N tasks

被引:0
作者
Weerasinghe, GM [1 ]
Lipsky, L [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
来源
COMPUTERS AND THEIR APPLICATIONS | 2001年
关键词
Networks of Workstations; message passing; performance evaluation; fault-tolerance; asynchronous; communication; dynamic load balancing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper is a performance study of a fault-tolerant asynchronous algorithm for performing N independent and idempotent tasks on P processes. It is designed for the programming model Single Program Multiple Data (SPMD) and the failure model Fail-Stop failures without restarts. Our algorithm tolerates up to P - 1 process failures. That is, at least one process must survive for the lifetime of the application. The algorithm is structured in terms of a Symmetric Task Model in which each process is responsible for scheduling tasks dynamically, and distributing progress information. A parameter called Periodicity controls how often progress information is distributed to the rest of the processes. A process can fail while distributing its progress information, causing inconsistencies between task partitions of different processes. Therefore, the major design goals are: to optimize the scheduling phase such that in the presence of failures and communication time-outs, the number of tasks redone is minimized; to minimize the allocation of resources. In our study we avoid the use of checkpointing. Lost tasks are simply redone. Processes communicate only through asynchronous message passing. We present preliminary results of performance tests of this algorithm that we have implemented.
引用
收藏
页码:69 / 73
页数:5
相关论文
共 50 条
  • [21] Fault-tolerant nanocomputers based on asynchronous Cellular Automata
    Isokawa, T
    Abo, F
    Peper, F
    Adachi, S
    Lee, J
    Matsui, N
    Mashiko, S
    [J]. INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2004, 15 (06): : 893 - 915
  • [22] A LINEAR FAULT-TOLERANT NAMING ALGORITHM
    BEAUQUIER, J
    GASTIN, P
    VILLAIN, V
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 486 : 57 - 70
  • [23] A fault-tolerant broadcasting algorithm for hypercubes
    Chiu, GM
    [J]. INFORMATION PROCESSING LETTERS, 1998, 66 (02) : 93 - 99
  • [24] A novel fault-tolerant parallel algorithm
    Wang, Panfeng
    Du, Yunfei
    Fu, Hongyi
    Zhou, Haifang
    Yang, Xuejun
    Yang, Wenjing
    [J]. ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2007, 4847 : 18 - 29
  • [25] OPTIMAL-DESIGN OF FAULT-TOLERANT DISTRIBUTED SYSTEMS BASED ON A RECURSIVE ALGORITHM
    PHAM, H
    UPADHYAYA, SJ
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 1991, 40 (03) : 375 - 379
  • [26] A Distributed Fault-Tolerant Topology Control Algorithm for Heterogeneous Wireless Sensor Networks
    Bagci, Hakki
    Korpeoglu, Ibrahim
    Yazici, Adnan
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (04) : 914 - 923
  • [27] A Novel Fault-Tolerant Scheme for Distributed Systems
    Zhang, Xiaoqin
    Wei, Zhidong
    Zhang, Fenggui
    Liu, Guoliang
    [J]. CEIS 2011, 2011, 15
  • [28] Fault-tolerant configuration of distributed discrete controllers
    Fujimoto, Y
    Sekiguchi, T
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2003, 50 (01) : 86 - 93
  • [29] Cyclic storage for fault-tolerant distributed executions
    Marcelin-Jimenez, Ricardo
    Rajsbaum, Sergio
    Stevens, Brett
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2006, 17 (09) : 1028 - 1036
  • [30] DETECTING UNREALIZABILITY OF DISTRIBUTED FAULT-TOLERANT SYSTEMS
    Finkbeiner, Bernd
    Tentrup, Leander
    [J]. LOGICAL METHODS IN COMPUTER SCIENCE, 2015, 11 (03)