Towards Data-Independent Knowledge Transfer in Model-Heterogeneous Federated Learning

被引：13

作者：

Zhang, Jie ^{[1
]}

Guo, Song ^{[1
]}

Guo, Jingcai ^{[1
]}

Zeng, Deze ^{[2
]}

Zhou, Jingren ^{[3
]}

Zomaya, Albert Y. ^{[4
,5
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

[2] China Univ Geosci Wuhan, Sch Comp Sci & Technol, Wuhan 430079, Peoples R China

[3] Alibaba Grp, Hangzhou 311121, Peoples R China

[4] High Performance Comp & Networking, Sydney, NSW, Australia

[5] Univ Sydney, Sch Informat Technol, Australian Res Council, Sydney, NSW, Australia

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Federated learning; model heterogeneity; GAN; knowledge transfer;

D O I：

10.1109/TC.2023.3272801

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Federated Distillation (FD) extends classic Federated Learning (FL) to a more general training framework that enables model-heterogeneous collaborative learning by Knowledge Distillation (KD) across multiple clients and the server. However, existing KD-based algorithms usually require a set of shared input samples for each client to produce soft-prediction for distillation. Worse still, such a manual selection is accompanied by careful deliberations or prior information on clients' private data distribution, which is not in line with the privacy-preserving characteristic of classic FL. In this paper, we propose a novel training framework to achieve data-independent knowledge transfer by properly designing a distributed generative adversarial network (GAN) between the server and clients that can synthesize shared feature representations to facilitate the FD training. Specifically, we deploy a generator on the server and reuse each local model as a federated discriminator to form a lightweight efficient distributed GAN that can automatically synthesize simulated global feature representations for distillation. Moreover, since the synthesized feature representations are usually more faithful and homologous with global data distribution, faster and better training convergence can be obtained. Extensive experiments on different tasks and heterogeneous models demonstrate the effectiveness of the proposed framework on model accuracy and communication overhead.

引用

页码：2888 / 2901

页数：14

共 45 条

[1] Albuquerque I, 2019, PR MACH LEARN RES, V97
[2] Cohen G, 2017, IEEE IJCNN, P2921, DOI 10.1109/IJCNN.2017.7966217
[3] Dinh CT, 2020, ADV NEUR IN, V33
[4] Durugkar I, 2017, Arxiv, DOI [arXiv:1611.01673, 10.48550/arXiv:1611.01673]
[5] Fallah A, 2020, ADV NEUR IN, V33
[6] Ghosh A., 2020, P 34 INT C NEUR INF, P1643
[7] Ensemble Attention Distillation for Privacy-Preserving Federated Learning
Gong, Xuan
Sharma, Abhishek
Karanam, Srikrishna
Wu, Ziyan
Chen, Terrence
Doermann, David
Innanje, Arun
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15056 - 15066
[8] Guha N., 2018, PROC INT C NEURAL IN
[9] Haddadpour F, 2021, PR MACH LEARN RES, V130
[10] Hamer J., 2020, PROC 37 INT C MACH L, V119, P3973

← 1 2 3 4 5 →