PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

被引:1
作者
Hu, Qinghao [1 ]
Li, Gang [2 ]
Wu, Qiman [3 ]
Cheng, Jian [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Baidu Inc, Beijing, Peoples R China
来源
COMPUTER VISION, ECCV 2022, PT XI | 2022年 / 13671卷
基金
中国国家自然科学基金;
关键词
Quantization; Network acceleration; CNNs;
D O I
10.1007/978-3-031-20083-0_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demonstrate that PalQuant has superior performance to state-of-the-art quantization methods in both accuracy and inference speed, e.g., for ResNet-18 network quantization, PalQuant can obtain 0.52% higher accuracy and 1.78x speedup simultaneously over their 4-bit counterpart on a state-of-the-art 2-bit accelerator. Code is available at https://github.com/huqinghao/PalQuant.
引用
收藏
页码:312 / 327
页数:16
相关论文
共 40 条
[1]  
Abdel-Aziz Hamzah., 2021, Rethinking Floating Point Overheads for Mixed Precision DNN Accelerators
[2]   YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration [J].
Andri, Renzo ;
Cavigelli, Lukas ;
Rossi, Davide ;
Benini, Luca .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (01) :48-60
[3]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[4]  
Bogart KP., 1989, Introductory combinatorics
[5]   Deep Learning with Low Precision by Half-wave Gaussian Quantization [J].
Cai, Zhaowei ;
He, Xiaodong ;
Sun, Jian ;
Vasconcelos, Nuno .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5406-5414
[6]   Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing [J].
Camusy, Vincent ;
Meiy, Linyan ;
Enz, Christian ;
Verhelst, Marian .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) :697-711
[7]   DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning [J].
Chen, Tianshi ;
Du, Zidong ;
Sun, Ninghui ;
Wang, Jia ;
Wu, Chengyong ;
Chen, Yunji ;
Temam, Olivier .
ACM SIGPLAN NOTICES, 2014, 49 (04) :269-283
[8]   Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks [J].
Chen, Yu-Hsin ;
Krishna, Tushar ;
Emer, Joel S. ;
Sze, Vivienne .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2017, 52 (01) :127-138
[9]  
Choi J, 2018, Arxiv, DOI arXiv:1805.06085
[10]  
Conti F, 2018, Arxiv, DOI [arXiv:1807.03010, DOI 10.1109/TCAD.2018.2857019, 10.1109/TCAD.2018.2857019]