Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks

被引：57

作者：

Chen, Zhiqiang ^{[1
,2
]}

Xu, Ting-Bing ^{[2
,3
]}

Du, Changde ^{[1
,2
]}

Liu, Cheng-Lin ^{[2
,3
,4
]}

He, Huiguang ^{[2
,4
,5
]}

机构：

[1] Chinese Acad Sci CASIA, Inst Automat, Res Ctr Brain Inspired Intelligence, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China

[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China

[5] Chinese Acad Sci CASIA, Res Ctr Brain Inspired Intelligence, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Training; Channel estimation; Logic gates; Computer architecture; Convolution; Biological neural networks; Automation; Conditional accuracy change (CAC); direct criterion; dynamical channel pruning; neural network compression; structure shaping;

D O I：

10.1109/TNNLS.2020.2979517

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Channel pruning is an effective technique that has been widely applied to deep neural network compression. However, many existing methods prune from a pretrained model, thus resulting in repetitious pruning and fine-tuning processes. In this article, we propose a dynamical channel pruning method, which prunes unimportant channels at the early stage of training. Rather than utilizing some indirect criteria (e.g., weight norm, absolute weight sum, and reconstruction error) to guide connection or channel pruning, we design criteria directly related to the final accuracy of a network to evaluate the importance of each channel. Specifically, a channelwise gate is designed to randomly enable or disable each channel so that the conditional accuracy changes (CACs) can be estimated under the condition of each channel disabled. Practically, we construct two effective and efficient criteria to dynamically estimate CAC at each iteration of training; thus, unimportant channels can be gradually pruned during the training process. Finally, extensive experiments on multiple data sets (i.e., ImageNet, CIFAR, and MNIST) with various networks (i.e., ResNet, VGG, and MLP) demonstrate that the proposed method effectively reduces the parameters and computations of baseline network while yielding the higher or competitive accuracy. Interestingly, if we Double the initial Channels and then Prune Half (DCPH) of them to baseline's counterpart, it can enjoy a remarkable performance improvement by shaping a more desirable structure.

引用

页码：799 / 813

页数：15

共 60 条

[51]

Szegedy C, 2015, PROC CVPR IEEE, P1, DOI 10.1109/CVPR.2015.7298594

[52]

Wen W, 2016, ADV NEUR IN, V29

[53] Aggregated Residual Transformations for Deep Neural Networks [J].

Xie, Saining ;

Girshick, Ross ;

Dollar, Piotr ;

Tu, Zhuowen ;

He, Kaiming .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5987-5995

[54]

Xu YH, 2018, AAAI CONF ARTIF INTE, P4335

[55]

Zagoruyko S., 2016, P BRIT MACH VIS C, DOI [10.5244/C.30.87, DOI 10.5244/C.30.87]

[56] Interleaved Group Convolutions [J].

Zhang, Ting ;

Qi, Guo-Jun ;

Xiao, Bin ;

Wang, Jingdong .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4383-4392

[57] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [J].

Zhang, Xiangyu ;

Zhou, Xinyu ;

Lin, Mengxiao ;

Sun, Ran .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :6848-6856

[58] Practical Block-wise Neural Network Architecture Generation [J].

Zhong, Zhao ;

Yan, Junjie ;

Wu, Wei ;

Shao, Jing ;

Liu, Cheng-Lin .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2423-2432

[59]

Zhuang ZW, 2018, ADV NEUR IN, V31

[60]

Zoph B., 2016, P 5 INT C LEARN REPR

← 1 2 3 4 5 6 →