Better Together: Data-Free Multi-Student Coevolved Distillation

被引：1

作者：

Chen, Weijie ^{[1
,2
]}

Xuan, Yunyi ^{[2
]}

Yang, Shicai ^{[2
]}

Xie, Di ^{[2
]}

Lin, Luojun ^{[3
]}

Zhuang, Yueting ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] Hikvis Res Inst, Hangzhou, Peoples R China

[3] Fuzhou Univ, Coll Comp & Data Sci, Fuzhou, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 283卷

基金：

国家重点研发计划;

关键词：

Knowledge distillation; Adversarial training; Model inversion; Surrogate images; Mutual learning;

D O I：

10.1016/j.knosys.2023.111146

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data-Free Knowledge Distillation (DFKD) aims to craft a customized student model from a pre-trained teacher model by synthesizing surrogate training images. However, a seldom-investigated scenario is to distill the knowledge to multiple heterogeneous students simultaneously. In this paper, we aim to study how to improve the performance by coevolving peer students, termed Data-Free Multi-Student Coevolved Distillation (DF-MSCD). Based on previous DFKD methods, we advance DF-MSCD by improving the data quality from the perspective of synthesizing unbiased, informative and diverse surrogate samples: 1) Unbiased. The disconnection of image synthesis among different timestamps during DFKD will lead to an unnoticed class imbalance problem. To tackle this problem, we reform the prior art into an unbiased variant by bridging the label distribution of the synthesized data among different timestamps. 2) Informative. Different from single-student DFKD, we encourage the interactions not only between teacher-student pairs, but also within peer students, driving a more comprehensive knowledge distillation. To this end, we devise a novel Inter-Student Adversarial Learning method to coevolve peer students with mutual benefits. 3) Diverse. To further promote Inter-Student Adversarial Learning, we develop Mixture-of-Generators, in which multiple generators are optimized to synthesize different yet complementary samples by playing min-max games with multiple students. Experiments are conducted to validate the effectiveness and efficiency of the proposed DF-MSCD, surpassing the existing state-of-the-arts on multiple popular benchmarks. To emphasize, our method can obtain heterogeneous students by training once, which is superior to single-student DFKD methods in terms of both training time and testing accuracy.

引用

页数：13

共 25 条

[21] Reusable generator data-free knowledge distillation with hard loss simulation for image classification
Sun, Yafeng
Wang, Xingwang
Huang, Junhong
Chen, Shilin
Hou, Minghui
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
[22] Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression
Hao, Zhiwei
Luo, Yong
Hu, Han
An, Jianping
Wen, Yonggang
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1803 - 1811
[23] Data-Free Learning for Lightweight Multi-Weather Image Restoration
Wang, Pei
Huang, Hongzhan
Luo, Xiaotong
Qu, Yanyun
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[24] IFHE: Intermediate-Feature Heterogeneity Enhancement for Image Synthesis in Data-Free Knowledge Distillation
Chen, Yi
Liu, Ning
Ren, Ao
Yang, Tao
Liu, Duo
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[25] Up to Thousands-fold Storage Saving: Towards Efficient Data-Free Distillation of Large-Scale Visual Classifiers
Ye, Fanfan
Lu, Bingyi
Ma, Liang
Zhong, Qiaoyong
Xie, Di
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8376 - 8386

← 1 2 3 →