Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

被引:12
作者
Al-Hami, Mo'taz [1 ]
Pietron, Marcin [2 ]
Casas, Raul [3 ]
Wielgosz, Maciej [2 ]
机构
[1] Hashemite Univ, Dept Comp Informat Syst, Zarqa 13115, Jordan
[2] AGH Univ Sci & Technol, Dept Comp Sci & Elect Engn, Krakow, Poland
[3] Cadence Design Syst, San Jose, CA USA
关键词
Convolutional neural network (CNN); Fixed-point; Quantization; Pruning; Clustering; K-means; Hybrid quantization; Incremental pruning; Partial quantization; Histogram;
D O I
10.1007/s11063-019-10076-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning has made a real revolution in the embedded computing environment. Convolutional neural network (CNN) revealed itself as a reliable fit to many emerging problems. The next step, is to enhance the CNN role in the embedded devices including both implementation details and performance. Resources needs of storage and computational ability are limited and constrained, resulting in key issues we have to consider in embedded devices. Compressing (i.e., quantizing) the CNN network is a valuable solution. In this paper, Our main goals are: memory compression and complexity reduction (both operations and cycles reduction) of CNNs, using methods (including quantization and pruning) that don't require retraining (i.e., allowing us to exploit them in mobile system, or robots). Also, exploring further quantization techniques for further complexity reduction. To achieve these goals, we compress a CNN model layers (i.e., parameters and outputs) into suitable precision formats using several quantization methodologies. The methodologies are: First, we describe a pruning approach, which allows us to reduce the required storage and computation cycles in embedded devices. Such enhancement can drastically reduce the consumed power and the required resources. Second, a hybrid quantization approach with automatic tuning for the network compression. Third, a K-means quantization approach. With a minor degradation relative to the floating-point performance, the presented pruning and quantization methods are able to produce a stable performance fixed-point reduced networks. A precise fixed-point calculations for coefficients, input/output signals and accumulators are considered in the quantization process.
引用
收藏
页码:105 / 127
页数:23
相关论文
共 41 条
[1]  
Al-Hami M, 2019, MULTIMED TOOLS APPL, V78, P3587, DOI 10.1007/s11042-018-6789-4
[2]   Reconstructing 3D Human Poses from Keyword Based Image Database Query [J].
Al-Hami, Mo'taz ;
Lakaemper, Rolf .
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, :440-448
[3]  
Al-Hami M, 2014, 2014 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS (ARSO), P137, DOI 10.1109/ARSO.2014.7020994
[4]  
[Anonymous], 2016, P ICLR
[5]  
[Anonymous], NVID PASC ARCH
[6]  
[Anonymous], 2016, ARXIV160201528
[7]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[8]  
[Anonymous], 2018, PROC 10 INT C AGENTS
[9]  
[Anonymous], 2014, IEEE WORKSH SIGN PRO
[10]  
[Anonymous], 2016, INT C LEARNING REPRE