Adding Before Pruning: Sparse Filter Fusion for Deep Convolutional Neural Networks via Auxiliary Attention

被引:0
作者
Tian, Guanzhong [1 ,2 ]
Sun, Yiran [1 ]
Liu, Yuang [1 ]
Zeng, Xianfang [1 ]
Wang, Mengmeng [1 ]
Liu, Yong [1 ]
Zhang, Jiangning [1 ]
Chen, Jun [1 ]
机构
[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, Ningbo Res Inst, Ningbo 315000, Peoples R China
关键词
Training; Fuses; Computational modeling; Sun; Optimization; Learning systems; Feature extraction; Deep neural networks (DNNs); effective feature fusion; feature selection; filter pruning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Filter pruning is a significant feature selection technique to shrink the existing feature fusion schemes (especially on convolution calculation and model size), which helps to develop more efficient feature fusion models while maintaining state-of-the-art performance. In addition, it reduces the storage and computation requirements of deep neural networks (DNNs) and accelerates the inference process dramatically. Existing methods mainly rely on manual constraints such as normalization to select the filters. A typical pipeline comprises two stages: first pruning the original neural network and then fine-tuning the pruned model. However, choosing a manual criterion can be somehow tricky and stochastic. Moreover, directly regularizing and modifying filters in the pipeline suffer from being sensitive to the choice of hyperparameters, thus making the pruning procedure less robust. To address these challenges, we propose to handle the filter pruning issue through one stage: using an attention-based architecture that adaptively fuses the filter selection with filter learning in a unified network. Specifically, we present a pruning method named adding before pruning (ABP) to make the model focus on the filters of higher significance by training instead of man-made criteria such as norm, rank, etc. First, we add an auxiliary attention layer into the original model and set the significance scores in this layer to be binary. Furthermore, to propagate the gradients in the auxiliary attention layer, we design a specific gradient estimator and prove its effectiveness for convergence in the graph flow through mathematical derivation. In the end, to relieve the dependence on the complicated prior knowledge for designing the thresholding criterion, we simultaneously prune and train the filters to automatically eliminate network redundancy with recoverability. Extensive experimental results on the two typical image classification benchmarks, CIFAR-10 and ILSVRC-2012, illustrate that the proposed approach performs favorably against previous state-of-the-art filter pruning algorithms.
引用
收藏
页码:3930 / 3942
页数:13
相关论文
共 42 条
[1]  
[Anonymous], 2009, Technical report
[2]   Deep semantic segmentation of natural and medical images: a review [J].
Asgari Taghanaki, Saeid ;
Abhishek, Kumar ;
Cohen, Joseph Paul ;
Cohen-Adad, Julien ;
Hamarneh, Ghassan .
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (01) :137-178
[3]  
Brutzkus A, 2017, PR MACH LEARN RES, V70
[4]  
Du SS, 2018, PR MACH LEARN RES, V80
[5]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[6]  
Guo YW, 2016, ADV NEUR IN, V29
[7]  
Han S., 2016, P INT C LEARN REPR, P1
[8]  
Hao L, 2016, ArXiv Preprint, P1, DOI DOI 10.48550/ARXIV.1608.08710
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]  
He Y, 2018, PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2234