Fault Tolerance Properties of Gossip-Based Distributed Orthogonal Iteration Methods

被引:10
|
作者
Strakova, Hana [1 ]
Niederbrucker, Gerhard [1 ]
Gansterer, Wilfried N. [1 ]
机构
[1] Univ Vienna, Res Grp Theory & Applicat Algorithms, A-1010 Vienna, Austria
来源
2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE | 2013年 / 18卷
基金
奥地利科学基金会;
关键词
fault tolerance; self-healing algorithm; gossip algorithms; distributed algorithm; randomized communication schedule; orthogonal iteration; ALGORITHMS;
D O I
10.1016/j.procs.2013.05.182
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we investigate and compare the fault tolerance properties and resilience of gossip-based distributed orthogonal iteration algorithms for the in-network computation of the extreme eigenpairs of matrix. Gossip-based algorithms have many attractive properties, especially for loosely coupled distributed and decentralized systems, like P2P networks or sensor networks. Due to their randomized communication schedule and the fact that communication happens only between nearest neighbors, they are highly flexible with respect to the topology of the underlying system. Moreover, such algorithms have a big potential for high resilience against various types of failures. Lately, several gossip-based distributed eigensolvers based on orthogonal iteration method have been introduced. However, the performance of these algorithms in the presence of failures has not been analyzed yet. We illustrate that convergence properties, the numerical accuracy achieved, as well as resilience properties of gossip-based distributed orthogonal iteration are basically determined by the choice of the distributed data aggregation algorithm (DDAA) which is required within the algorithm for performing distributed reduction operations (such as summation or averaging) across the system. In particular, we illustrate that when using the proper combination of DDAA and distributed orthogonal iteration method, high accuracy can be achieved and even silent message loss can be tolerated without any loss in numerical accuracy.
引用
收藏
页码:189 / 198
页数:10
相关论文
共 27 条
  • [21] Agent-Based Fault-Tolerance Mechanism for Distributed Key-Value Database
    Wu Hui-Jun
    Lu Kai
    Li Gen
    Jiang Jin-Fei
    Wang Shuang-Xi
    2014 5TH INTERNATIONAL CONFERENCE ON DIGITAL HOME (ICDH), 2014, : 267 - 271
  • [22] Optimizing checkpoint-based fault-tolerance in distributed stream processing systems: Theory to practice
    Jayasekara, Sachini
    Karunasekera, Shanika
    Harwood, Aaron
    SOFTWARE-PRACTICE & EXPERIENCE, 2022, 52 (01) : 296 - 315
  • [23] A Novel Data-Based Fault-Tolerant Control Method for Multicontroller Linear Systems via Distributed Policy Iteration
    Wei, Qinglai
    Li, Hongyang
    Li, Tao
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (05): : 3176 - 3186
  • [24] Dependability in Embedded Systems: A Survey of Fault Tolerance Methods and Software-Based Mitigation Techniques
    Solouki, Mohammadreza Amel
    Angizi, Shaahin
    Violante, Massimo
    IEEE ACCESS, 2024, 12 : 180939 - 180967
  • [25] An Infrastructure for Enabling Dynamic Fault Tolerance in Highly-Reliable Adaptive Distributed Embedded Systems Based on Switched Ethernet
    Ballesteros, Alberto
    Barranco, Manuel
    Proenza, Julian
    Almeida, Luis
    Pozo, Francisco
    Palmer-Rodriguez, Pere
    SENSORS, 2022, 22 (18)
  • [26] Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators - Trends in Quantum Computing, Heterogeneous Systems and Reliability
    Venkatesha, Shashikiran
    Parthasarathi, Ranjani
    ACM COMPUTING SURVEYS, 2024, 56 (11)
  • [27] Evaluation Platform for Testing Fault Tolerance Properties: Soft-core Processor-based Experimental Robot Controller
    Podivinsky, Jakub
    Lojda, Jakub
    Cekan, Ondrej
    Kotasek, Zdenek
    2018 21ST EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2018), 2018, : 229 - 236