Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [1] Genuinely distributed Byzantine machine learning
    El-Mhamdi, El-Mahdi
    Guerraoui, Rachid
    Guirguis, Arsany
    Hoang, Le-Nguyen
    Rouault, Sebastien
    DISTRIBUTED COMPUTING, 2022, 35 (04) : 305 - 331
  • [2] Genuinely distributed Byzantine machine learning
    El-Mahdi El-Mhamdi
    Rachid Guerraoui
    Arsany Guirguis
    Lê-Nguyên Hoang
    Sébastien Rouault
    Distributed Computing, 2022, 35 : 305 - 331
  • [3] Flexible Byzantine Fault Tolerance
    Malkhi, Dahlia
    Nayak, Kartik
    Ren, Ling
    PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'19), 2019, : 1041 - 1053
  • [4] A Survey on Distributed Machine Learning
    Verbraeken, Joost
    Wolting, Matthijs
    Katzy, Jonathan
    Kloppenburg, Jeroen
    Verbelen, Tim
    Rellermeyer, Jan S.
    ACM COMPUTING SURVEYS, 2020, 53 (02)
  • [5] Egalitarian Byzantine Fault Tolerance
    Eischer, Michael
    Distler, Tobias
    2021 IEEE 26TH PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING (PRDC 2021), 2021, : 77 - 86
  • [6] Multi-Threshold Byzantine Fault Tolerance
    Momose, Atsuki
    Ren, Ling
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 1686 - 1699
  • [7] Acceleration of Byzantine Fault Tolerance by Parallelizing Consensuses
    Nakamura, Junya
    Araragi, Tadashi
    Masuyama, Shigeru
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 80 - +
  • [8] Byzantine Fault Tolerance as a Service
    Chai, Hua
    Zhao, Wenbing
    COMPUTER APPLICATIONS FOR WEB, HUMAN COMPUTER INTERACTION, SIGNAL AND IMAGE PROCESSING AND PATTERN RECOGNITION, 2012, 342 : 173 - 179
  • [9] Parallel Byzantine Fault Tolerance
    Zbierski, Maciej
    SOFT COMPUTING IN COMPUTER AND INFORMATION SCIENCE, 2015, 342 : 321 - 333
  • [10] Deterministic or probabilistic?- A survey on Byzantine fault tolerant state machine replication
    Freitas, Tadeu
    Soares, Joao
    Correia, Manuel E.
    Martins, Rolando
    COMPUTERS & SECURITY, 2023, 129