Dual discriminator adversarial distillation for data-free model compression

被引:12
|
作者
Zhao, Haoran [1 ]
Sun, Xin [1 ,2 ]
Dong, Junyu [1 ]
Manic, Milos [3 ]
Zhou, Huiyu [4 ]
Yu, Hui [5 ]
机构
[1] Ocean Univ China, Coll Informat Sci & Engn, Qingdao, Peoples R China
[2] Tech Univ Munich, Dept Aerosp & Geodesy, Munich, Germany
[3] Virginia Commonwealth Univ, Coll Engn, Richmond, VA USA
[4] Univ Leicester, Sch Informat, Leicester, Leics, England
[5] Univ Portsmouth, Sch Creat Technol, Portsmouth, Hants, England
基金
中国国家自然科学基金;
关键词
Deep neural networks; Image classification; Model compression; Knowledge distillation; Data-free; KNOWLEDGE; NETWORK; RECOGNITION;
D O I
10.1007/s13042-021-01443-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without the need of any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, without using the original training data. Extensive experiments are conducted to demonstrate the effectiveness of the proposed approach on CIFAR, Caltech101 and ImageNet datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid, NYUv2, Cityscapes and VOC 2012. To the best of our knowledge, this is the first work on generative model based data-free knowledge distillation on large-scale datasets such as ImageNet, Cityscapes and VOC 2012. Experiments show that our method outperforms all baselines for data-free knowledge distillation.
引用
收藏
页码:1213 / 1230
页数:18
相关论文
共 50 条
  • [41] Model Selection - Knowledge Distillation Framework for Model Compression
    Chen, Renhai
    Yuan, Shimin
    Wang, Shaobo
    Li, Zhenghan
    Xing, Meng
    Feng, Zhiyong
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [42] Data-free fingerprinting technology for biometric classifiers
    Ren, Ziting
    Duan, Yucong
    Qi, Qi
    Luo, Lanhua
    COMPUTERS & SECURITY, 2025, 154
  • [43] Diverse Sample Generation: Pushing the Limit of Generative Data-Free Quantization
    Qin, Haotong
    Ding, Yifu
    Zhang, Xiangguo
    Wang, Jiakai
    Liu, Xianglong
    Lu, Jiwen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11689 - 11706
  • [44] Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks
    Xu, Qing
    Chen, Zhenghua
    Ragab, Mohamed
    Wang, Chao
    Wu, Min
    Li, Xiaoli
    NEUROCOMPUTING, 2022, 485 : 242 - 251
  • [45] Triplet Knowledge Distillation Networks for Model Compression
    Tang, Jialiang
    Jiang, Ning
    Yu, Wenxin
    Wu, Wenqin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [46] Analysis of Model Compression Using Knowledge Distillation
    Hong, Yu-Wei
    Leu, Jenq-Shiou
    Faisal, Muhamad
    Prakosa, Setya Widyawan
    IEEE ACCESS, 2022, 10 : 85095 - 85105
  • [47] Dual Distillation Discriminator Networks for Domain Adaptive Few-Shot Learning
    Liu, Xiyao
    Ji, Zhong
    Pang, Yanwei
    Han, Zhi
    NEURAL NETWORKS, 2023, 165 : 625 - 633
  • [48] Compression of Acoustic Model via Knowledge Distillation and Pruning
    Li, Chenxing
    Zhu, Lei
    Xu, Shuang
    Gao, Peng
    Xu, Bo
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2785 - 2790
  • [49] Enhancing Global Model Performance in Federated Learning With Non-IID Data Using a Data-Free Generative Diffusion Model
    Najafi, Mohammadreza
    Daneshtalab, Masoud
    Lee, Jeong-A
    Saadloonia, Ghazal
    Shin, Seokjoo
    IEEE ACCESS, 2024, 12 : 148230 - 148239
  • [50] DATA: Dynamic Adversarial Thermal Anti-distillation
    Zhang, Yao
    Li, Yang
    Pan, Zhisong
    KNOWLEDGE-BASED SYSTEMS, 2025, 309