Pruning-and-distillation: One-stage joint compression framework for CNNs via clustering

被引：4

作者：

Niu, Tao ^{[1
]}

Teng, Yinglei ^{[1
]}

Jin, Lei ^{[1
]}

Zou, Panpan ^{[1
]}

Liu, Yiding ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 136卷

基金：

中国国家自然科学基金;

关键词：

Filter pruning; Clustering; Knowledge distillation; Deep neural networks; NEURAL-NETWORKS;

D O I：

10.1016/j.imavis.2023.104743

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Network pruning and knowledge distillation, as two effective network compression techniques, have drawn extensive attention due to their success in reducing model complexity. However, previous works regard them as two independent methods and combine them in an isolated manner rather than joint, leading to a sub-optimal optimization. In this paper, we propose a collaborative compression scheme named Pruningand-Distillation via Clustering (PDC), which integrates pruning and distillation into an end-to-end single-stage framework that takes both advantages of them. Specifically, instead of directly deleting or zeroing out unimportant filters within each layer, we reconstruct them based on clustering, which preserves the learned features as much as possible. The guidance from the teacher is integrated into the pruning process to further improve the generalization of pruned model, which alleviates the randomness caused by reconstruction to some extent. After convergence, we can equivalently remove reconstructed filters within each cluster through the proposed channel addition operation. Benefiting from such equivalence, we no longer require the time-consuming finetuning step to regain accuracy. Extensive experiments on CIFAR-10/100 and ImageNet datasets show that our method achieves the best trade-off between performance and complexity compared with other state-of-theart algorithms. For example, for ResNet-110, we achieve a 61.5% FLOPs reduction with even 0.14% top-1 accuracy increase on CIFAR-10 and remove 55.2% FLOPs with only 0.32% accuracy drop on CIFAR-100. & COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：11

共 15 条

[1] One-stage object detection knowledge distillation via adversarial learning
Na Dong
Yongqiang Zhang
Mingli Ding
Shibiao Xu
Yancheng Bai
Applied Intelligence, 2022, 52 : 4582 - 4598
[2] One-stage object detection knowledge distillation via adversarial learning
Dong, Na
Zhang, Yongqiang
Ding, Mingli
Xu, Shibiao
Bai, Yancheng
APPLIED INTELLIGENCE, 2022, 52 (04) : 4582 - 4598
[3] Compression of Acoustic Model via Knowledge Distillation and Pruning
Li, Chenxing
Zhu, Lei
Xu, Shuang
Gao, Peng
Xu, Bo
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2785 - 2790
[4] Balanced knowledge distillation for one-stage object detector
Lee, Sungwook
Lee, Seunghyun
Song, Byung Cheol
NEUROCOMPUTING, 2022, 500 : 394 - 404
[5] PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
Kim, Jangho
Chang, Simyung
Kwak, Nojun
INTERSPEECH 2021, 2021, : 4568 - 4572
[6] Joint Dual Feature Distillation and Gradient Progressive Pruning for BERT compression
Zhang, Zhou
Lu, Yang
Wang, Tengfei
Wei, Xing
Wei, Zhen
NEURAL NETWORKS, 2024, 179
[7] GAN-Knowledge Distillation for One-Stage Object Detection
Wang, Wanwei
Hong, Wei
Wang, Feng
Yu, Jinke
IEEE ACCESS, 2020, 8 : 60719 - 60727
[8] Joint structured pruning and dense knowledge distillation for efficient transformer model compression
Cui, Baiyun
Li, Yingming
Zhang, Zhongfei
NEUROCOMPUTING, 2021, 458 : 56 - 69
[9] Model compression via pruning and knowledge distillation for person re-identification
Xie, Haonan
Jiang, Wei
Luo, Hao
Yu, Hongyan
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (02) : 2149 - 2161
[10] Model compression via pruning and knowledge distillation for person re-identification
Haonan Xie
Wei Jiang
Hao Luo
Hongyan Yu
Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 2149 - 2161

← 1 2 →