Communication-Efficient and Model-Heterogeneous Personalized Federated Learning via Clustered Knowledge Transfer

被引：24

作者：

Cho, Yae Jee ^{[1
]}

Wang, Jianyu ^{[1
]}

Chirvolu, Tarun ^{[2
]}

Joshi, Gauri ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA

[2] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2023年 / 17卷 / 01期

关键词：

Federated learning; communication efficiency; model heterogeneity; knowledge transfer; clustering;

D O I：

10.1109/JSTSP.2022.3231527

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Personalized federated learning (PFL) aims to train model(s) that can perform well on the individual edge-devices' data where the edge-devices (clients) are usually IoT devices like our mobile phones. The participating clients for cross-device settings, in general, have heterogeneous system capabilities and limited communication bandwidth. Such practical properties of the edge-devices, however, are overlooked by many recent work in PFL, which use the same model architecture across all clients and incur high communication cost by directly communicating the model parameters. In our work, we propose a novel and practical PFL framework named COMET where clients can use heterogeneous models of their own choice and do not directly communicate their model parameters to other parties. Instead, COMET uses clustered codistillation, where clients use knowledge distillation to transfer their knowledge to other clients with similar data distributions. This presents a practical PFL framework for the edge-devices to train through IoT networks by lifting the heavy communication burden of communicating large models. We theoretically show the convergence and generalization properties of COMET and empirically show that COMET achieves high test accuracy with several orders of magnitude lower communication cost while allowing client model heterogeneity compared to the other state-of-the-art PFL methods.

引用

页码：234 / 247

页数：14

共 50 条

[1] On the projected subgradient method for nonsmooth convex optimization in a Hilbert space
Alber, YI
Iusem, AN
Solodov, MV
[J]. MATHEMATICAL PROGRAMMING, 1998, 81 (01) : 23 - 35
[2] Allen-Zhu Z, 2021, Arxiv, DOI [arXiv:2012.09816, 10.48550/arXiv.2012.09816]
[3] Partial FC: Training 10 Million Identities on a Single Machine
An, Xiang
Zhu, Xuhan
Gao, Yuan
Xiao, Yang
Zhao, Yongle
Feng, Ziyong
Wu, Lan
Qin, Bin
Zhang, Ming
Zhang, Debing
Fu, Ying
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1445 - 1449
[4] Anil R, 2020, Arxiv, DOI arXiv:1804.03235
[5] A theory of learning from different domains
Ben-David, Shai
Blitzer, John
Crammer, Koby
Kulesza, Alex
Pereira, Fernando
Vaughan, Jennifer Wortman
[J]. MACHINE LEARNING, 2010, 79 (1-2) : 151 - 175
[6] Bistritz I., 2020, Proc. Adv. Neural Inf. Process. Syst., P22593
[7] Chang H., 2021, PROC NEURIPS WORKSHO
[8] Cho Y. J., 2022, PROC 13 INT JOINT C
[9] Cho Y. J., 2022, PROC 25RD INT C ARTI
[10] Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning
Cho, Yae Jee
Gupta, Samarth
Joshi, Gauri
Yagan, Osman
[J]. 2020 54TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2020, : 1066 - 1069

← 1 2 3 4 5 →