A Low-Power Sparse Convolutional Neural Network Accelerator With Pre-Encoding Radix-4 Booth Multiplier

被引:0
作者
Cheng, Quan [1 ]
Dai, Liuyao [2 ]
Huang, Mingqiang [2 ]
Shen, Ao [2 ]
Mao, Wei [2 ]
Hashimoto, Masanori [1 ]
Yu, Hao [2 ]
机构
[1] Kyoto Univ, Dept Commun & Comp Engn, Kyoto 6068501, Japan
[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolutional neural networks; Adders; Engines; Power demand; Inference algorithms; Hardware; Feature extraction; Accelerator; CNN; MAC; radix-4; Booth; low-power; sparsity; EFFICIENT; PROCESSOR;
D O I
10.1109/TCSII.2022.3231361
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Working on edging device, convolutional neural network (CNN) inference application demands low-power consumption and high-performance computation. Therefore, exploiting energy-efficient multiply-and-accumulate (MAC) unit and high-throughput sparse CNN accelerator is of great importance. In this brief, we develop a sparse CNN accelerator achieving a high MAC-unit utilization ratio and great power efficiency. The accelerator includes a radix-4 Booth multiplier for pre-encoding weights to reduce the number of partial products (PPs) and the encoder power consumption. The proposed accelerator has the following three features. Firstly, we reduce the bit number of PPs exploiting the features of radix-4 Booth algorithm and offline weight pre-processing. Secondly, we extract eight encoders from relevant multipliers and merge them into one pre-encoding module to reduce area. Finally, after encoding non-zero weights offline, we design an activation selector module to select the activations corresponding to non-zero weights for subsequent multiple-add operations. The proposed work is designed by Verilog HDL language and implemented in a 28nm process. The proposed accelerator achieves 7.0325 TOPS/W with 50% sparsity and scales with sparsity up to 14.3720 TOPS/W at 87.5%.
引用
收藏
页码:2246 / 2250
页数:5
相关论文
共 20 条
  • [1] [Anonymous], 2010, CMOS VLSI Design: A Circuits and Systems Perspective
  • [2] A Low Power Radix-4 Booth Multiplier With Pre-Encoded Mechanism
    Chang, Yen-Jen
    Cheng, Yu-Cheng
    Liao, Shao-Chi
    Hsiao, Chun-Huo
    [J]. IEEE ACCESS, 2020, 8 : 114842 - 114853
  • [3] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices
    Chen, Yu-Hsin
    Yange, Tien-Ju
    Emer, Joel S.
    Sze, Vivienne
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) : 292 - 308
  • [4] A Modified Partial Product Generator for Redundant Binary Multipliers
    Cui, Xiaoping
    Liu, Weiqiang
    Chen, Xin
    Swartzlander, Earl E., Jr.
    Lombardi, Fabrizio
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (04) : 1165 - 1171
  • [5] Ercegovac M. D., 2003, Digital Arithmetic
  • [6] Power Efficient Tiny Yolo CNN Using Reduced Hardware Resources Based on Booth Multiplier and WALLACE Tree Adders
    Farrukh, Fasih Ud Din
    Zhang, Chun
    Jiang, Yancao
    Zhang, Zhonghan
    Wang, Ziqiang
    Wang, Zhihua
    Jiang, Hanjun
    [J]. IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS, 2020, 1 : 76 - 87
  • [7] Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks
    Gysel, Philipp
    Pimentel, Jon
    Motamedi, Mohammad
    Ghiasi, Soheil
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (11) : 5784 - 5789
  • [8] EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark A.
    Dally, William J.
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
  • [10] Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact Multipliers
    Leon, Vasileios
    Zervakis, Georgios
    Soudris, Dimitrios
    Pekmestzi, Kiamal
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (03) : 421 - 430