Dual discriminator adversarial distillation for data-free model compression

被引：12

作者：

Zhao, Haoran ^{[1
]}

Sun, Xin ^{[1
,2
]}

Dong, Junyu ^{[1
]}

Manic, Milos ^{[3
]}

Zhou, Huiyu ^{[4
]}

Yu, Hui ^{[5
]}

机构：

[1] Ocean Univ China, Coll Informat Sci & Engn, Qingdao, Peoples R China

[2] Tech Univ Munich, Dept Aerosp & Geodesy, Munich, Germany

[3] Virginia Commonwealth Univ, Coll Engn, Richmond, VA USA

[4] Univ Leicester, Sch Informat, Leicester, Leics, England

[5] Univ Portsmouth, Sch Creat Technol, Portsmouth, Hants, England

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2022年 / 13卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Deep neural networks; Image classification; Model compression; Knowledge distillation; Data-free; KNOWLEDGE; NETWORK; RECOGNITION;

D O I：

10.1007/s13042-021-01443-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without the need of any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, without using the original training data. Extensive experiments are conducted to demonstrate the effectiveness of the proposed approach on CIFAR, Caltech101 and ImageNet datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid, NYUv2, Cityscapes and VOC 2012. To the best of our knowledge, this is the first work on generative model based data-free knowledge distillation on large-scale datasets such as ImageNet, Cityscapes and VOC 2012. Experiments show that our method outperforms all baselines for data-free knowledge distillation.

引用

页码：1213 / 1230

页数：18

共 50 条

[41] Model Selection - Knowledge Distillation Framework for Model Compression
Chen, Renhai
Yuan, Shimin
Wang, Shaobo
Li, Zhenghan
Xing, Meng
Feng, Zhiyong
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[42] Data-free fingerprinting technology for biometric classifiers
Ren, Ziting
Duan, Yucong
Qi, Qi
Luo, Lanhua
COMPUTERS & SECURITY, 2025, 154
[43] Diverse Sample Generation: Pushing the Limit of Generative Data-Free Quantization
Qin, Haotong
Ding, Yifu
Zhang, Xiangguo
Wang, Jiakai
Liu, Xianglong
Lu, Jiwen
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11689 - 11706
[44] Contrastive adversarial knowledge distillation for deep model compression in time-series regression tasks
Xu, Qing
Chen, Zhenghua
Ragab, Mohamed
Wang, Chao
Wu, Min
Li, Xiaoli
NEUROCOMPUTING, 2022, 485 : 242 - 251
[45] Triplet Knowledge Distillation Networks for Model Compression
Tang, Jialiang
Jiang, Ning
Yu, Wenxin
Wu, Wenqin
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[46] Analysis of Model Compression Using Knowledge Distillation
Hong, Yu-Wei
Leu, Jenq-Shiou
Faisal, Muhamad
Prakosa, Setya Widyawan
IEEE ACCESS, 2022, 10 : 85095 - 85105
[47] Dual Distillation Discriminator Networks for Domain Adaptive Few-Shot Learning
Liu, Xiyao
Ji, Zhong
Pang, Yanwei
Han, Zhi
NEURAL NETWORKS, 2023, 165 : 625 - 633
[48] Compression of Acoustic Model via Knowledge Distillation and Pruning
Li, Chenxing
Zhu, Lei
Xu, Shuang
Gao, Peng
Xu, Bo
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2785 - 2790
[49] Enhancing Global Model Performance in Federated Learning With Non-IID Data Using a Data-Free Generative Diffusion Model
Najafi, Mohammadreza
Daneshtalab, Masoud
Lee, Jeong-A
Saadloonia, Ghazal
Shin, Seokjoo
IEEE ACCESS, 2024, 12 : 148230 - 148239
[50] DATA: Dynamic Adversarial Thermal Anti-distillation
Zhang, Yao
Li, Yang
Pan, Zhisong
KNOWLEDGE-BASED SYSTEMS, 2025, 309

← 1 2 3 4 5 →