Efficient Neural Network Compression Inspired by Compressive Sensing

被引:12
作者
Gao, Wei [1 ,2 ]
Guo, Yang [1 ,2 ]
Ma, Siwei [3 ]
Li, Ge [1 ,2 ]
Kwong, Sam [4 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen 518055, Peoples R China
[2] Peng Chong Lab, Shenzhen 518066, Peoples R China
[3] Peking Univ, Inst Digital Media, Beijing 100084, Peoples R China
[4] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
关键词
Transforms; Artificial neural networks; Matrix decomposition; Training; Neural networks; Sparse matrices; Redundancy; Compressive sensing (CS); deep neural networks (DNNs); neural network compression (NNC); two-step training; PURSUIT;
D O I
10.1109/TNNLS.2022.3186008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional neural network compression (NNC) methods decrease the model size and floating-point operations (FLOPs) in the manner of screening out unimportant weight parameters; however, the intrinsic sparsity characteristics have not been fully exploited. In this article, from the perspective of signal processing and analysis for network parameters, we propose to use a compressive sensing (CS)-based method, namely NNCS, for performance improvements. Our proposed NNCS is inspired by the discovery that sparsity levels of weight parameters in the transform domain are greater than those in the original domain. First, to achieve sparse representations for parameters in the transform domain during training, we incorporate a constrained CS model into loss function. Second, the proposed effective training process consists of two steps, where the first step trains raw weight parameters and induces and reconstructs their sparse representations and the second step trains transform coefficients to improve network performances. Finally, we transform the entire neural network into another new domain-based representation, and a sparser parameter distribution can be obtained to facilitate inference acceleration. Experimental results demonstrate that NNCS can significantly outperform the other existing state-of-the-art methods in terms of parameter reductions and FLOPs. With VGGNet on CIFAR-10, we decrease 94.8% parameters and achieve a 76.8% reduction of FLOPs, with 0.13% drop in Top-1 accuracy. With ResNet-50 on ImageNet, we decrease 75.6% parameters and achieve a 78.9% reduction of FLOPs, with 1.24% drop in Top-1 accuracy.
引用
收藏
页码:1965 / 1979
页数:15
相关论文
共 67 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2017, NIPS
[3]   Decoding by linear programming [J].
Candes, EJ ;
Tao, T .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (12) :4203-4215
[4]   Near-optimal signal recovery from random projections: Universal encoding strategies? [J].
Candes, Emmanuel J. ;
Tao, Terence .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2006, 52 (12) :5406-5425
[5]  
Chellapilla Kumar., 2006, 10 INT WORKSHOP FRON
[6]   Enhanced LSTM for Natural Language Inference [J].
Chen, Qian ;
Zhu, Xiaodan ;
Ling, Zhenhua ;
Wei, Si ;
Jiang, Hui ;
Inkpen, Diana .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1657-1668
[7]   SMOOTHING PROXIMAL GRADIENT METHOD FOR GENERAL STRUCTURED SPARSE REGRESSION [J].
Chen, Xi ;
Lin, Qihang ;
Kim, Seyoung ;
Carbonell, Jaime G. ;
Xing, Eric P. .
ANNALS OF APPLIED STATISTICS, 2012, 6 (02) :719-752
[8]   An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections [J].
Cheng, Yu ;
Yu, Felix X. ;
Feris, Rogerio S. ;
Kumar, Sanjiv ;
Choudhary, Alok ;
Chang, Shih-Fu .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2857-2865
[9]   Towards Efficient Model Compression via Learned Global Ranking [J].
Chin, Ting-Wu ;
Ding, Ruizhou ;
Zhang, Cha ;
Marculescu, Diana .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :1515-1525
[10]  
Cortes Y LeCun C., MNIST DATABASE HANDW