Communication-Efficient and Model-Heterogeneous Personalized Federated Learning via Clustered Knowledge Transfer

被引:24
作者
Cho, Yae Jee [1 ]
Wang, Jianyu [1 ]
Chirvolu, Tarun [2 ]
Joshi, Gauri [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
Federated learning; communication efficiency; model heterogeneity; knowledge transfer; clustering;
D O I
10.1109/JSTSP.2022.3231527
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Personalized federated learning (PFL) aims to train model(s) that can perform well on the individual edge-devices' data where the edge-devices (clients) are usually IoT devices like our mobile phones. The participating clients for cross-device settings, in general, have heterogeneous system capabilities and limited communication bandwidth. Such practical properties of the edge-devices, however, are overlooked by many recent work in PFL, which use the same model architecture across all clients and incur high communication cost by directly communicating the model parameters. In our work, we propose a novel and practical PFL framework named COMET where clients can use heterogeneous models of their own choice and do not directly communicate their model parameters to other parties. Instead, COMET uses clustered codistillation, where clients use knowledge distillation to transfer their knowledge to other clients with similar data distributions. This presents a practical PFL framework for the edge-devices to train through IoT networks by lifting the heavy communication burden of communicating large models. We theoretically show the convergence and generalization properties of COMET and empirically show that COMET achieves high test accuracy with several orders of magnitude lower communication cost while allowing client model heterogeneity compared to the other state-of-the-art PFL methods.
引用
收藏
页码:234 / 247
页数:14
相关论文
共 50 条
  • [1] On the projected subgradient method for nonsmooth convex optimization in a Hilbert space
    Alber, YI
    Iusem, AN
    Solodov, MV
    [J]. MATHEMATICAL PROGRAMMING, 1998, 81 (01) : 23 - 35
  • [2] Allen-Zhu Z, 2021, Arxiv, DOI [arXiv:2012.09816, 10.48550/arXiv.2012.09816]
  • [3] Partial FC: Training 10 Million Identities on a Single Machine
    An, Xiang
    Zhu, Xuhan
    Gao, Yuan
    Xiao, Yang
    Zhao, Yongle
    Feng, Ziyong
    Wu, Lan
    Qin, Bin
    Zhang, Ming
    Zhang, Debing
    Fu, Ying
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1445 - 1449
  • [4] Anil R, 2020, Arxiv, DOI arXiv:1804.03235
  • [5] A theory of learning from different domains
    Ben-David, Shai
    Blitzer, John
    Crammer, Koby
    Kulesza, Alex
    Pereira, Fernando
    Vaughan, Jennifer Wortman
    [J]. MACHINE LEARNING, 2010, 79 (1-2) : 151 - 175
  • [6] Bistritz I., 2020, Proc. Adv. Neural Inf. Process. Syst., P22593
  • [7] Chang H., 2021, PROC NEURIPS WORKSHO
  • [8] Cho Y. J., 2022, PROC 13 INT JOINT C
  • [9] Cho Y. J., 2022, PROC 25RD INT C ARTI
  • [10] Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning
    Cho, Yae Jee
    Gupta, Samarth
    Joshi, Gauri
    Yagan, Osman
    [J]. 2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 1066 - 1069