Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
收藏
页数:59
相关论文
共 50 条
  • [21] From distributed machine to distributed deep learning: a comprehensive survey
    Mohammad Dehghani
    Zahra Yazdanparast
    Journal of Big Data, 10
  • [22] Interaction Patterns for Byzantine Fault Tolerance Computing
    Chai, Hua
    Zhao, Wenbing
    COMPUTER APPLICATIONS FOR WEB, HUMAN COMPUTER INTERACTION, SIGNAL AND IMAGE PROCESSING AND PATTERN RECOGNITION, 2012, 342 : 180 - 188
  • [23] Scalable Byzantine Fault Tolerance on Heterogeneous Servers
    Eischer, Michael
    Distler, Tobias
    2017 13TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2017), 2017, : 34 - 41
  • [24] Byzantine Fault Tolerance for Services with Commutative Operations
    Chai, Hua
    Zhao, Wenbing
    2014 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2014), 2014, : 219 - 226
  • [25] Resource-Efficient Byzantine Fault Tolerance
    Distler, Tobias
    Cachin, Christian
    Kapitza, Ruediger
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (09) : 2807 - 2819
  • [26] Practical byzantine fault tolerance and proactive recovery
    Castro, M
    Liskov, B
    ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2002, 20 (04): : 398 - 461
  • [27] SLC: A Permissioned Blockchain for Secure Distributed Machine Learning against Byzantine Attacks
    Liang, Lun
    Cao, Xianghui
    Zhang, Jun
    Sun, Changyin
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 7073 - 7078
  • [28] High Performance and Scalable Byzantine Fault Tolerance
    Jiang, Yanjun
    Lian, Zhuang
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 1195 - 1202
  • [29] BDFL: A Byzantine-Fault-Tolerance Decentralized Federated Learning Method for Autonomous Vehicle
    Chen, Jin-Hua
    Chen, Min-Rong
    Zeng, Guo-Qiang
    Weng, Jia-Si
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2021, 70 (09) : 8639 - 8652
  • [30] Byzantine Machine Learning: A Primer
    Guerraoui, Rachid
    Gupta, Nirupam
    Pinot, Rafael
    ACM COMPUTING SURVEYS, 2024, 56 (07)