Better Together: Data-Free Multi-Student Coevolved Distillation

被引：1

作者：

Chen, Weijie ^{[1
,2
]}

Xuan, Yunyi ^{[2
]}

Yang, Shicai ^{[2
]}

Xie, Di ^{[2
]}

Lin, Luojun ^{[3
]}

Zhuang, Yueting ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] Hikvis Res Inst, Hangzhou, Peoples R China

[3] Fuzhou Univ, Coll Comp & Data Sci, Fuzhou, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 283卷

基金：

国家重点研发计划;

关键词：

Knowledge distillation; Adversarial training; Model inversion; Surrogate images; Mutual learning;

D O I：

10.1016/j.knosys.2023.111146

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data-Free Knowledge Distillation (DFKD) aims to craft a customized student model from a pre-trained teacher model by synthesizing surrogate training images. However, a seldom-investigated scenario is to distill the knowledge to multiple heterogeneous students simultaneously. In this paper, we aim to study how to improve the performance by coevolving peer students, termed Data-Free Multi-Student Coevolved Distillation (DF-MSCD). Based on previous DFKD methods, we advance DF-MSCD by improving the data quality from the perspective of synthesizing unbiased, informative and diverse surrogate samples: 1) Unbiased. The disconnection of image synthesis among different timestamps during DFKD will lead to an unnoticed class imbalance problem. To tackle this problem, we reform the prior art into an unbiased variant by bridging the label distribution of the synthesized data among different timestamps. 2) Informative. Different from single-student DFKD, we encourage the interactions not only between teacher-student pairs, but also within peer students, driving a more comprehensive knowledge distillation. To this end, we devise a novel Inter-Student Adversarial Learning method to coevolve peer students with mutual benefits. 3) Diverse. To further promote Inter-Student Adversarial Learning, we develop Mixture-of-Generators, in which multiple generators are optimized to synthesize different yet complementary samples by playing min-max games with multiple students. Experiments are conducted to validate the effectiveness and efficiency of the proposed DF-MSCD, surpassing the existing state-of-the-arts on multiple popular benchmarks. To emphasize, our method can obtain heterogeneous students by training once, which is superior to single-student DFKD methods in terms of both training time and testing accuracy.

引用

页数：13

共 25 条

[1] Multi-student Collaborative Self-supervised Distillation
Yang, Yinan
Chen, Li
Wu, Shaohui
Sun, Zhuang
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT II, 2023, 14087 : 199 - 210
[2] Privacy-Preserving Student Learning with Differentially Private Data-Free Distillation
Liu, Bochao
Lu, Jianghu
Wang, Pengju
Zhang, Junjie
Zeng, Dan
Qian, Zhenxing
Ge, Shiming
2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
[3] Memory efficient data-free distillation for continual learning
Li, Xiaorong
Wang, Shipeng
Sun, Jian
Xu, Zongben
PATTERN RECOGNITION, 2023, 144
[4] ROBUSTNESS AND DIVERSITY SEEKING DATA-FREE KNOWLEDGE DISTILLATION
Han, Pengchao
Park, Jihong
Wang, Shiqiang
Liu, Yejun
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2740 - 2744
[5] Data-free knowledge distillation in neural networks for regression
Kang, Myeonginn
Kang, Seokho
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 175
[6] Data-free Knowledge Distillation for Reusing Recommendation Models
Wang, Cheng
Sun, Jiacheng
Dong, Zhenhua
Zhu, Jieming
Li, Zhenguo
Li, Ruixuan
Zhang, Rui
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023, 2023, : 386 - 395
[7] Double-Generators Network for Data-Free Knowledge Distillation
Zhang J.
Ju J.
Ren Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (07): : 1615 - 1627
[8] Dual discriminator adversarial distillation for data-free model compression
Zhao, Haoran
Sun, Xin
Dong, Junyu
Manic, Milos
Zhou, Huiyu
Yu, Hui
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (05) : 1213 - 1230
[9] Parameterized data-free knowledge distillation for heterogeneous federated learning
Guo, Cheng
He, Qianqian
Tang, Xinyu
Liu, Yining
Jie, Yingmo
KNOWLEDGE-BASED SYSTEMS, 2025, 317
[10] Dual discriminator adversarial distillation for data-free model compression
Haoran Zhao
Xin Sun
Junyu Dong
Milos Manic
Huiyu Zhou
Hui Yu
International Journal of Machine Learning and Cybernetics, 2022, 13 : 1213 - 1230

← 1 2 3 →