A decentralized and fault tolerant convergence detection algorithm for asynchronous iterative algorithms

被引:1
作者
Charr, Jean-Claude [1 ]
Couturier, Raphael [1 ]
Laiymani, David [1 ]
机构
[1] Univ Franche Comte, Lab Comp Sci Franche Comte, IUT Belfort Montbeliard, F-90016 Belfort, France
关键词
Decentralized global convergence detection mechanism; Peer-to-Peer environment; Distributed clusters; Fault tolerance;
D O I
10.1007/s11227-009-0293-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This article presents an algorithm that performs a decentralized detection of the global convergence of parallel asynchronous iterative applications. This algorithm is fault tolerant. It runs a decentralized saving procedure which enables this algorithm, after a node's crash, to replace the dead node by a new one which will continue the computing task from the last check point. Combined with the advantages of the asynchronous iteration model, this method allows us to compute very large scale problems using highly volatile parallel architectures like Peer-to-Peer and distributed clusters architectures. We also present the implementation of this algorithm in the JaceP2P platform which is dedicated to designing and executing parallel asynchronous iterative applications in volatile environments. Numerous experiments show the robustness and the efficiency of our algorithm.
引用
收藏
页码:269 / 292
页数:24
相关论文
共 50 条
  • [1] A decentralized and fault tolerant convergence detection algorithm for asynchronous iterative algorithms
    Jean-Claude Charr
    Raphaël Couturier
    David Laiymani
    The Journal of Supercomputing, 2010, 53 : 269 - 292
  • [2] ASYNCHRONOUS FAULT-TOLERANT TOTAL ORDERING ALGORITHMS
    MOSER, LE
    MELLIARSMITH, PM
    AGRAWALA, V
    SIAM JOURNAL ON COMPUTING, 1993, 22 (04) : 727 - 750
  • [3] A Fault-Tolerant Distributed Framework for Asynchronous Iterative Computations
    Zhou, Tian
    Gao, Lixin
    Guan, Xiaohong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2021, 32 (08) : 2062 - 2073
  • [4] A Fault-Tolerant Framework for Asynchronous Iterative Computations in Cloud Environments
    Wang, Zhigang
    Gao, Lixin
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, : 71 - 83
  • [5] A Fault-Tolerant Framework for Asynchronous Iterative Computations in Cloud Environments
    Wang, Zhigang
    Gao, Lixin
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (08) : 1678 - 1692
  • [6] JACEP2P-V2: A fully decentralized and fault tolerant environment for executing parallel iterative asynchronous applications on volatile distributed architectures
    Charr, Jean-Claude
    Couturier, Raphael
    Laiymani, David
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2011, 27 (05): : 606 - 613
  • [7] Fault Tolerant Implementation of Peer-to-Peer Distributed Iterative Algorithms
    The Tung Nguyen
    El-Baz, Didier
    15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012), 2012, : 137 - 145
  • [8] A distributed fault-tolerant asynchronous algorithm for performing N tasks
    Weerasinghe, GM
    Lipsky, L
    COMPUTERS AND THEIR APPLICATIONS, 2001, : 69 - 73
  • [9] JACEP2P-V2: A Fully Decentralized and Fault Tolerant Environment for Executing Parallel Iterative Asynchronous Applications on Volatile Distributed Architectures
    Charr, Jean-Claude
    Couturier, Raphael
    Laiymani, David
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2009, 5529 : 446 - 458
  • [10] Fault Tolerant Decentralized Scheduling Algorithm for P2P Grid
    Chauhan, Piyush
    Nitin
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 698 - 707