Learning Critically: Selective Self-Distillation in Federated Learning on Non-IID Data

被引：12

作者：

He, Yuting ^{[1
]}

Chen, Yiqiang ^{[2
,3
]}

Yang, XiaoDong ^{[2
,3
]}

Yu, Hanchao ^{[4
]}

Huang, Yi-Hua ^{[1
]}

Gu, Yang ^{[2
]}

机构：

[1] Univ Chinese Acad Sci, Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China

[3] Shandong Acad Intelligent Comp Technol, Jinan 250101, Peoples R China

[4] Chinese Acad Sci, Frontier Sci & Educ, Beijing 100864, Peoples R China

来源：

IEEE TRANSACTIONS ON BIG DATA | 2024年 / 10卷 / 06期

关键词：

Data models; Training; Servers; Collaborative work; Adaptation models; Convergence; Feature extraction; Federated learning; knowledge distillation; non-identically distributed; deep learning; catastrophic forgetting;

D O I：

10.1109/TBDATA.2022.3189703

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Federated learning (FL) enables multiple clients to collaboratively train a global model while keeping local data decentralized. Data heterogeneity (non-IID) across clients has imposed significant challenges to FL, which makes local models re-optimize towards their own local optima and forget the global knowledge, resulting in performance degradation and convergence slowdown. Many existing works have attempted to address the non-IID issue by adding an extra global-model-based regularizing item to the local training but without an adaption scheme, which is not efficient enough to achieve high performance with deep learning models. In this paper, we propose a Selective Self-Distillation method for Federated learning (FedSSD), which imposes adaptive constraints on the local updates by self-distilling the global model's knowledge and selectively weighting it by evaluating the credibility at both the class and sample level. The convergence guarantee of FedSSD is theoretically analyzed and extensive experiments are conducted on three public benchmark datasets, which demonstrates that FedSSD achieves better generalization and robustness in fewer communication rounds, compared with other state-of-the-art FL methods.

引用

页码：789 / 800

页数：12

共 51 条

[1]

Chang HY, 2019, Arxiv, DOI arXiv:1912.11279

[2]

Chen YQ, 2021, Arxiv, DOI arXiv:2106.01009

[3]

Dao T., 2021, P 9 INT C LEARN REPR, P1

[4]

Deng YY, 2020, Arxiv, DOI [arXiv:2003.13461, DOI 10.48550/ARXIV.2003.13461]

[5] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[6]

He Y., 2022, P INT WORKSH TRUST V, P1

[7] A Comprehensive Overhaul of Feature Distillation [J].

Heo, Byeongho ;

Kim, Jeesoo ;

Yun, Sangdoo ;

Park, Hyojin ;

Kwak, Nojun ;

Choi, Jin Young .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1921-1930

[8]

Hinton G, 2015, Arxiv, DOI arXiv:1503.02531

[9]

Huang YT, 2021, AAAI CONF ARTIF INTE, V35, P7865

[10]

Jeong E, 2023, Arxiv, DOI arXiv:1811.11479

← 1 2 3 4 5 6 →