Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [41] Application of Distributed Machine Learning Model in Fault Diagnosis of Air Preheater
    Lei, Haokun
    Liu, Jian
    Xian, Chun
    2019 4TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2019), 2019, : 312 - 317
  • [42] Byzantine Fault-Tolerant Consensus Algorithms: A Survey
    Zhong, Weiyu
    Yang, Ce
    Liang, Wei
    Cai, Jiahong
    Chen, Lin
    Liao, Jing
    Xiong, Naixue
    ELECTRONICS, 2023, 12 (18)
  • [43] BigBFT: A Multileader Byzantine Fault Tolerance Protocol for High Throughput
    Alqahtani, Salem
    Demirbas, Murat
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [44] A Formally Verified Protocol for Log Replication with Byzantine Fault Tolerance
    Wanner, Joel
    Chuat, Laurent
    Perrig, Adrian
    2020 INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2020), 2020, : 101 - 112
  • [45] GARFIELD: System Support for Byzantine Machine Learning (Regular Paper)
    Guerraoui, Rachid
    Guirguis, Arsany
    Plassmann, Jeremy
    Ragot, Anton
    Rouault, Sebastien
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021), 2021, : 39 - 51
  • [46] A Cycle-Time-Analysis Model for Byzantine Fault Tolerance
    Chen, Liu
    Zhou, Wei
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2015, 2015, 9532 : 659 - 668
  • [47] Practical Byzantine fault tolerance consensus based on comprehensive reputation
    Qi, Jiamou
    Guan, Yepeng
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2023, 16 (01) : 420 - 430
  • [48] DBFT: A Byzantine Fault Tolerance Protocol With Graceful Performance Degradation
    Zhang, Jingjing
    Rong, Yingyao
    Cao, Jiannong
    Rong, Chunming
    Bian, Jing
    Wu, Weigang
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (05) : 3387 - 3400
  • [49] Practical Byzantine fault tolerance consensus based on comprehensive reputation
    Jiamou Qi
    Yepeng Guan
    Peer-to-Peer Networking and Applications, 2023, 16 : 420 - 430
  • [50] Design and implementation of a Byzantine fault tolerance framework for Web services
    Zhao, Wenbing
    JOURNAL OF SYSTEMS AND SOFTWARE, 2009, 82 (06) : 1004 - 1015