Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [11] Transparent three-phase Byzantine fault tolerance for parallel and distributed simulations
    Li, Zengxiang
    Cai, Wentong
    Turner, Stephen John
    Qin, Zheng
    Goh, Rick Siow Mong
    SIMULATION MODELLING PRACTICE AND THEORY, 2016, 60 : 90 - 107
  • [12] A Method of Parallelizing Consensuses for Accelerating Byzantine Fault Tolerance
    Nakamura, Junya
    Araragi, Tadashi
    Masuzawa, Toshimitsu
    Masuyama, Shigeru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (01): : 53 - 64
  • [13] Byzantine fault tolerance for nondeterministic applications
    Zhao, Weribing
    DASC 2007: THIRD IEEE INTERNATIONAL SYMPOSIUM ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, PROCEEDINGS, 2007, : 108 - 115
  • [14] Efficient Byzantine Fault-Tolerance
    Veronese, Giuliana Santos
    Correia, Miguel
    Bessani, Alysson Neves
    Lung, Lau Cheuk
    Verissimo, Paulo
    IEEE TRANSACTIONS ON COMPUTERS, 2013, 62 (01) : 16 - 30
  • [15] Zyzzyva: Speculative byzantine fault tolerance
    Kotla, Ramakrishna
    Alvisi, Lorenzo
    Dahlin, Mike
    Clement, Allen
    Wong, Edmund
    Operating Systems Review (ACM), 2007, : 45 - 58
  • [16] Zyzzyva: Speculative Byzantine Fault Tolerance
    Kotla, Ramakrishna
    Alvisi, Lorenzo
    Dahlin, Mike
    Clement, Allen
    Wong, Edmund
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2009, 27 (04):
  • [17] Switch-Centric Byzantine Fault Tolerance Mechanism in Distributed Software Defined Networks
    Han, Sol
    Jang, Seokwon
    Lee, Hochan
    Pack, Sangheon
    IEEE COMMUNICATIONS LETTERS, 2020, 24 (10) : 2236 - 2239
  • [18] Dynamic Practical Byzantine Fault Tolerance
    Xu Hao
    Long Yu
    Liu Zhiqiang
    Liu Zhen
    Gu Dawu
    2018 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY (CNS), 2018,
  • [19] Byzantine Fault Tolerance for Centrally Coordinated Missions with Unmanned Vehicles
    Grigoropoulos, Nasos
    Koutsoubelias, Manos
    Lalis, Spyros
    17TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2020 (CF 2020), 2020, : 165 - 173
  • [20] From distributed machine to distributed deep learning: a comprehensive survey
    Dehghani, Mohammad
    Yazdanparast, Zahra
    JOURNAL OF BIG DATA, 2023, 10 (01)