AdaDFKD: Exploring adaptive inter-sample relationship in data-free knowledge distillation

被引:0
|
作者
Li, Jingru [1 ]
Zhou, Sheng [1 ]
Li, Liangcheng [1 ]
Wang, Haishuai [1 ]
Bu, Jiajun [1 ]
Yu, Zhi [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Zhejiang Prov Key Lab Serv Robot, Zheda Rd, Hangzhou 310027, Zhejiang, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Data-free knowledge distillation; Unsupervised representation learning; Knowledge distillation;
D O I
10.1016/j.neunet.2024.106386
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In scenarios like privacy protection or large-scale data transmission, data-free knowledge distillation (DFKD) methods are proposed to learn Knowledge Distillation (KD) when data is not accessible. They generate pseudo samples by extracting the knowledge from teacher model, and utilize above pseudo samples for KD. The challenge in previous DFKD methods lies in the static nature of their target distributions and they focus on learning the instance-level distributions, causing its reliance on the pretrained teacher model. To address above concerns, our study introduces a novel DFKD approach known as AdaDFKD, designed to establish and utilize relationships among pseudo samples, which is adaptive to the student model, and finally effectively mitigates the aforementioned risk. We achieve this by generating from "easy-to-discriminate"samples to "hardto-discriminate"samples as human does. We design a relationship refinement module (R2M) to optimize the generation process, wherein we learn a progressive conditional distribution of negative samples and maximize the log-likelihood of inter-sample similarity of pseudosamples. Theoretically, we discover that such design of AdaDFKD both minimize the divergence and maximize the mutual information between the distribution of teacher and student models. Above results demonstrate the superiority of our approach over state-of-the-art (SOTA) DFKD methods across various benchmarks, teacher-student pairs, and evaluation metrics, as well as robustness and fast convergence.
引用
收藏
页数:15
相关论文
共 37 条
  • [31] Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation
    Liang, Jiawei
    Liang, Siyuan
    Liu, Aishan
    Ma, Ke
    Li, Jingzhi
    Cao, Xiaochun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 768 - 778
  • [32] Data free knowledge distillation with feature synthesis and spatial consistency for image analysis
    Liang, Pengchen
    Chen, Jianguo
    Wu, Yan
    Pu, Bin
    Huang, Haishan
    Chang, Qing
    Ran, Guo
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [33] Up to Thousands-fold Storage Saving: Towards Efficient Data-Free Distillation of Large-Scale Visual Classifiers
    Ye, Fanfan
    Lu, Bingyi
    Ma, Liang
    Zhong, Qiaoyong
    Xie, Di
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8376 - 8386
  • [34] AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation
    Zhou, Qinhong
    Li, Peng
    Liu, Yang
    Guan, Yuyang
    Xing, Qizhou
    Chen, Ming
    Sun, Maosong
    Liu, Yang
    AI OPEN, 2023, 4 : 56 - 63
  • [35] Knowledge Distillation via Token-Level Relationship Graph Based on the Big Data Technologies
    Zhang, Shuoxi
    Liu, Hanpeng
    He, Kun
    BIG DATA RESEARCH, 2024, 36
  • [36] Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data
    Li, Zhijian
    Yang, Biao
    Yin, Penghang
    Qi, Yingyong
    Xin, Jack
    IEEE ACCESS, 2023, 11 : 78042 - 78051
  • [37] Knowledge distillation model for Acute Lymphoblastic Leukemia Detection: Exploring the impact of nesterov-accelerated adaptive moment estimation optimizer
    Hassan, Esraa
    Saber, Abeer
    Elbedwehy, Samar
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 94