Better Together: Data-Free Multi-Student Coevolved Distillation

被引:1
|
作者
Chen, Weijie [1 ,2 ]
Xuan, Yunyi [2 ]
Yang, Shicai [2 ]
Xie, Di [2 ]
Lin, Luojun [3 ]
Zhuang, Yueting [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Hikvis Res Inst, Hangzhou, Peoples R China
[3] Fuzhou Univ, Coll Comp & Data Sci, Fuzhou, Peoples R China
基金
国家重点研发计划;
关键词
Knowledge distillation; Adversarial training; Model inversion; Surrogate images; Mutual learning;
D O I
10.1016/j.knosys.2023.111146
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-Free Knowledge Distillation (DFKD) aims to craft a customized student model from a pre-trained teacher model by synthesizing surrogate training images. However, a seldom-investigated scenario is to distill the knowledge to multiple heterogeneous students simultaneously. In this paper, we aim to study how to improve the performance by coevolving peer students, termed Data-Free Multi-Student Coevolved Distillation (DF-MSCD). Based on previous DFKD methods, we advance DF-MSCD by improving the data quality from the perspective of synthesizing unbiased, informative and diverse surrogate samples: 1) Unbiased. The disconnection of image synthesis among different timestamps during DFKD will lead to an unnoticed class imbalance problem. To tackle this problem, we reform the prior art into an unbiased variant by bridging the label distribution of the synthesized data among different timestamps. 2) Informative. Different from single-student DFKD, we encourage the interactions not only between teacher-student pairs, but also within peer students, driving a more comprehensive knowledge distillation. To this end, we devise a novel Inter-Student Adversarial Learning method to coevolve peer students with mutual benefits. 3) Diverse. To further promote Inter-Student Adversarial Learning, we develop Mixture-of-Generators, in which multiple generators are optimized to synthesize different yet complementary samples by playing min-max games with multiple students. Experiments are conducted to validate the effectiveness and efficiency of the proposed DF-MSCD, surpassing the existing state-of-the-arts on multiple popular benchmarks. To emphasize, our method can obtain heterogeneous students by training once, which is superior to single-student DFKD methods in terms of both training time and testing accuracy.
引用
收藏
页数:13
相关论文
共 25 条
  • [21] Reusable generator data-free knowledge distillation with hard loss simulation for image classification
    Sun, Yafeng
    Wang, Xingwang
    Huang, Junhong
    Chen, Shilin
    Hou, Minghui
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [22] Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression
    Hao, Zhiwei
    Luo, Yong
    Hu, Han
    An, Jianping
    Wen, Yonggang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1803 - 1811
  • [23] Data-Free Learning for Lightweight Multi-Weather Image Restoration
    Wang, Pei
    Huang, Hongzhan
    Luo, Xiaotong
    Qu, Yanyun
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [24] IFHE: Intermediate-Feature Heterogeneity Enhancement for Image Synthesis in Data-Free Knowledge Distillation
    Chen, Yi
    Liu, Ning
    Ren, Ao
    Yang, Tao
    Liu, Duo
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [25] Up to Thousands-fold Storage Saving: Towards Efficient Data-Free Distillation of Large-Scale Visual Classifiers
    Ye, Fanfan
    Lu, Bingyi
    Ma, Liang
    Zhong, Qiaoyong
    Xie, Di
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8376 - 8386