Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks

被引:54
作者
Chen, Zhiqiang [1 ,2 ]
Xu, Ting-Bing [2 ,3 ]
Du, Changde [1 ,2 ]
Liu, Cheng-Lin [2 ,3 ,4 ]
He, Huiguang [2 ,4 ,5 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Res Ctr Brain Inspired Intelligence, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China
[5] Chinese Acad Sci CASIA, Res Ctr Brain Inspired Intelligence, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Channel estimation; Logic gates; Computer architecture; Convolution; Biological neural networks; Automation; Conditional accuracy change (CAC); direct criterion; dynamical channel pruning; neural network compression; structure shaping;
D O I
10.1109/TNNLS.2020.2979517
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Channel pruning is an effective technique that has been widely applied to deep neural network compression. However, many existing methods prune from a pretrained model, thus resulting in repetitious pruning and fine-tuning processes. In this article, we propose a dynamical channel pruning method, which prunes unimportant channels at the early stage of training. Rather than utilizing some indirect criteria (e.g., weight norm, absolute weight sum, and reconstruction error) to guide connection or channel pruning, we design criteria directly related to the final accuracy of a network to evaluate the importance of each channel. Specifically, a channelwise gate is designed to randomly enable or disable each channel so that the conditional accuracy changes (CACs) can be estimated under the condition of each channel disabled. Practically, we construct two effective and efficient criteria to dynamically estimate CAC at each iteration of training; thus, unimportant channels can be gradually pruned during the training process. Finally, extensive experiments on multiple data sets (i.e., ImageNet, CIFAR, and MNIST) with various networks (i.e., ResNet, VGG, and MLP) demonstrate that the proposed method effectively reduces the parameters and computations of baseline network while yielding the higher or competitive accuracy. Interestingly, if we Double the initial Channels and then Prune Half (DCPH) of them to baseline's counterpart, it can enjoy a remarkable performance improvement by shaping a more desirable structure.
引用
收藏
页码:799 / 813
页数:15
相关论文
共 60 条
  • [51] Wen W, 2016, ADV NEUR IN, V29
  • [52] Xie G., 2018, ABS180406202 ARXIV
  • [53] Aggregated Residual Transformations for Deep Neural Networks
    Xie, Saining
    Girshick, Ross
    Dollar, Piotr
    Tu, Zhuowen
    He, Kaiming
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5987 - 5995
  • [54] Xu YH, 2018, AAAI CONF ARTIF INTE, P4335
  • [55] Zagoruyko Sergey, 2016, P BRIT MACH VIS C BM, DOI DOI 10.5244/C.30.87
  • [56] Interleaved Group Convolutions
    Zhang, Ting
    Qi, Guo-Jun
    Xiao, Bin
    Wang, Jingdong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4383 - 4392
  • [57] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
    Zhang, Xiangyu
    Zhou, Xinyu
    Lin, Mengxiao
    Sun, Ran
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6848 - 6856
  • [58] Practical Block-wise Neural Network Architecture Generation
    Zhong, Zhao
    Yan, Junjie
    Wu, Wei
    Shao, Jing
    Liu, Cheng-Lin
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 2423 - 2432
  • [59] Zhuang ZW, 2018, ADV NEUR IN, V31
  • [60] Zoph B., 2016, ARXIV161101578