HADFL: Heterogeneity-aware Decentralized Federated Learning Framework

被引:16
作者
Cao, Jing [1 ]
Lian, Zirui [1 ]
Liu, Weihong [1 ]
Zhu, Zongwei [1 ]
Ji, Cheng [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
来源
2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC) | 2021年
基金
中国博士后科学基金;
关键词
Distributed Training; Machine Learning; Federated Learning; Heterogeneous Computing;
D O I
10.1109/DAC18074.2021.9586101
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated learning (FL) supports training models on geographically distributed devices. However, traditional FL systems adopt a centralized synchronous strategy, putting high communication pressure and model generalization challenge. Existing optimizations on FL either fail to speedup training on heterogeneous devices or suffer from poor communication efficiency. In this paper, we propose HADFL, a framework that supports decentralized asynchronous training on heterogeneous devices. The devices train model locally with heterogeneity-aware local steps using local data. In each aggregation cycle, they are selected based on probability to perform model synchronization and aggregation. Compared with the traditional FL system, HADFL can relieve the central server's communication pressure, efficiently utilize heterogeneous computing power, and can achieve a maximum speedup of 3.15x than decentralized-FedAvg and 4.68x than Pytorch distributed training scheme, respectively, with almost no loss of convergence accuracy.
引用
收藏
页码:1 / 6
页数:6
相关论文
共 25 条
  • [1] Bonawitz K, 2019, Proc Mach Learn, V1, P374, DOI DOI 10.48550/ARXIV.1902.01046
  • [2] Communication-Efficient Federated Deep Learning With Layerwise Asynchronous Model Update and Temporally Weighted Aggregation
    Chen, Yang
    Sun, Xiaoyan
    Jin, Yaochu
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (10) : 4229 - 4238
  • [3] The Tail at Scale
    Dean, Jeffrey
    Barroso, Luiz Andre
    [J]. COMMUNICATIONS OF THE ACM, 2013, 56 (02) : 74 - 80
  • [4] Diao E, 2020, INT C LEARNING REPRE
  • [5] Gotmare A., 2018, INT C LEARN REPR 201
  • [6] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [7] Heged}us I., 2019, Joint European Conference on Machine Learning and Knowledge Discovery in Databases, P317
  • [8] Hu C, 2019, ARXIV PREPRINT ARXIV
  • [9] Huihuang Yu, 2019, 2019 IEEE 21st International Conference on High Performance Computing and Communications
  • [10] IEEE 17th International Conference on Smart City