ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting

被引:130
作者
Ding, Xiaohan [1 ,2 ]
Hao, Tianxiang [1 ,2 ]
Tan, Jianchao [3 ,4 ]
Liu, Ji [3 ,4 ]
Han, Jungong [5 ]
Guo, Yuchen [1 ]
Ding, Guiguang [1 ,2 ]
机构
[1] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol BNRis, Beijing, Peoples R China
[2] Tsinghua Univ, Sch Software, Beijing, Peoples R China
[3] Kwai Inc, AI Platform Dept, Seattle AI Lab, Beijing, Peoples R China
[4] Kwai Inc, FeDA Lab, Beijing, Peoples R China
[5] Aberystwyth Univ, Comp Sci Dept, Aberystwyth SY23 3FL, Dyfed, Wales
来源
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/ICCV48922.2021.00447
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio.
引用
收藏
页码:4490 / 4500
页数:11
相关论文
共 69 条
[1]  
Abbasi-Asl Reza, 2017, ARXIV PREPRINT ARXIV
[2]  
Alvarez Jose M., 2016, ADV NEURAL INFORM PR, P2262
[3]  
[Anonymous], 2016, ADV NEURAL INFORM PR
[4]  
[Anonymous], 2018, ARXIV180508941
[5]  
Ba Jimmy, 2014, ADV NEURAL INFORM PR, P2654, DOI DOI 10.48550/ARXIV.1312.6184
[6]  
Banner R, 2019, ADV NEUR IN, V32
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Ding X., 2021, ARXIV210501883
[9]  
DING XH, 2019, ADV NEUR IN, V32, pNI432
[10]   Diverse Branch Block: Building a Convolution as an Inception-like Unit [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Han, Jungong ;
Ding, Guiguang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10881-10890