Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks

被引:54
作者
Chen, Zhiqiang [1 ,2 ]
Xu, Ting-Bing [2 ,3 ]
Du, Changde [1 ,2 ]
Liu, Cheng-Lin [2 ,3 ,4 ]
He, Huiguang [2 ,4 ,5 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Res Ctr Brain Inspired Intelligence, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[3] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[4] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China
[5] Chinese Acad Sci CASIA, Res Ctr Brain Inspired Intelligence, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Channel estimation; Logic gates; Computer architecture; Convolution; Biological neural networks; Automation; Conditional accuracy change (CAC); direct criterion; dynamical channel pruning; neural network compression; structure shaping;
D O I
10.1109/TNNLS.2020.2979517
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Channel pruning is an effective technique that has been widely applied to deep neural network compression. However, many existing methods prune from a pretrained model, thus resulting in repetitious pruning and fine-tuning processes. In this article, we propose a dynamical channel pruning method, which prunes unimportant channels at the early stage of training. Rather than utilizing some indirect criteria (e.g., weight norm, absolute weight sum, and reconstruction error) to guide connection or channel pruning, we design criteria directly related to the final accuracy of a network to evaluate the importance of each channel. Specifically, a channelwise gate is designed to randomly enable or disable each channel so that the conditional accuracy changes (CACs) can be estimated under the condition of each channel disabled. Practically, we construct two effective and efficient criteria to dynamically estimate CAC at each iteration of training; thus, unimportant channels can be gradually pruned during the training process. Finally, extensive experiments on multiple data sets (i.e., ImageNet, CIFAR, and MNIST) with various networks (i.e., ResNet, VGG, and MLP) demonstrate that the proposed method effectively reduces the parameters and computations of baseline network while yielding the higher or competitive accuracy. Interestingly, if we Double the initial Channels and then Prune Half (DCPH) of them to baseline's counterpart, it can enjoy a remarkable performance improvement by shaping a more desirable structure.
引用
收藏
页码:799 / 813
页数:15
相关论文
共 60 条
  • [1] Dopaminergic modulation of hemodynamic signal variability and the functional connectome during cognitive performance
    Alavash, Mohsen
    Lim, Sung-Joo
    Thiel, Christiane
    Sehm, Bernhard
    Deserno, Lorenz
    Obleser, Jonas
    [J]. NEUROIMAGE, 2018, 172 : 341 - 356
  • [2] Pruning Neural Networks Using Multi-Armed Bandits
    Ameen, Salem
    Vadera, Sunil
    [J]. COMPUTER JOURNAL, 2020, 63 (07) : 1099 - 1108
  • [3] [Anonymous], 2017, ARXIV170404861
  • [4] [Anonymous], 2018, 29THE BRIT MACHINE V
  • [5] Baker B., 2016, ARXIV161102167
  • [6] Chen L.C., 2014, arXiv, V4, P357
  • [7] Chen WL, 2015, PR MACH LEARN RES, V37, P2285
  • [8] Chollet F, 2017, P IEEE C COMP VIS PA, P1251
  • [9] Courbariaux M., 2016, ARXIV160202830
  • [10] NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm
    Dai, Xiaoliang
    Yin, Hongxu
    Jha, Niraj K.
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (10) : 1487 - 1497