An Automatically Layer-Wise Searching Strategy for Channel Pruning Based on Task-Driven Sparsity Optimization

被引:27
作者
Feng, Kai-Yuan [1 ]
Fei, Xia [1 ]
Gong, Maoguo [1 ]
Qin, A. K. [2 ]
Li, Hao [1 ]
Wu, Yue [3 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Peoples R China
[2] Swinburne Univ Technol, Dept Comp Technol, Hawthorn, Vic 3122, Australia
[3] Xidian Univ, Sch Comp Sci & Technol, Xian 710071, Peoples R China
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Task analysis; Knowledge engineering; Training; Cost function; Convolutional neural networks; Computational modeling; Tensors; Deep neural networks; channel pruning; knowledge distillation; compression; NEURAL-NETWORKS;
D O I
10.1109/TCSVT.2022.3156588
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep convolutional neural networks (CNNs) have achieved tremendous successes but tend to suffer from high computation costs mainly due to heavy over-parameterization, resulting in the difficulty of directly applying them to the ever-growing application demands based on low-end edge devices with strong power restriction and real-time inference requirement. Recently, there has much research attention devoted to compressing the network via pruning to address this issue. Most of the existing methods rely on some hand-designed pruning rules, which suffer from several limitations. Firstly, manually designed rules are only applicable to limited application scenarios, which can hardly generalize well in a broader scope. And these rules are typically designed based on human experience and via trial and error, and thus highly subjective. Then, channels of different layers in a network may have diverse distributions, which means the same pruning rule is not appropriate for each layer. To address these limitations, we propose a novel channel pruning scheme, in which the task-irrelevant channels are removed in a task-driven manner. Specifically, an adaptively differentiable search module is proposed to find the best pruning rule automatically for different layers in CNNs under sparsity constraints. Besides, we employed knowledge distillation to alleviate the excessive performance loss. Once the training process is finished, a compact network will be obtained by removing channels based on layer-wise pruning rules. We have evaluated the proposed method on some well-known benchmark datasets including CIFAR, MNIST, and ImageNet in comparison to several state-of-the-art pruning methods. Experimental results demonstrate the superiority of our method over the compared ones in terms of both parameters and FLOPs reduction.
引用
收藏
页码:5790 / 5802
页数:13
相关论文
共 64 条
[1]  
[Anonymous], 2015, PROC INT C LEARNING
[2]  
[Anonymous], 2009, Rep. TR-2009
[3]  
[Anonymous], 2016, Advances in Neural Information Processing Systems
[4]  
[Anonymous], 2017, P INT C LEARN REPR
[5]   Fuzzy Integral-Based CNN Classifier Fusion for 3D Skeleton Action Recognition [J].
Banerjee, Avinandan ;
Singh, Pawan Kumar ;
Sarkar, Ram .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (06) :2206-2216
[6]  
Cai H, 2019, INT C LEARN REPR
[7]   Towards Efficient Model Compression via Learned Global Ranking [J].
Chin, Ting-Wu ;
Ding, Ruizhou ;
Zhang, Cha ;
Marculescu, Diana .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1515-1525
[8]  
Courbariaux M, 2015, ADV NEUR IN, V28
[9]  
Ding Guiguang, 2019, PR MACH LEARN RES, V97
[10]   Centripetal SGD for Pruning Very Deep Convolutional Networks With Complicated Structure [J].
Ding, Xiaohan ;
Ding, Guiguang ;
Guo, Yuchen ;
Han, Jungong .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4938-4948