A Low-Power Sparse Convolutional Neural Network Accelerator With Pre-Encoding Radix-4 Booth Multiplier

被引：4

作者：

Cheng, Quan ^{[1
]}

Dai, Liuyao ^{[2
]}

Huang, Mingqiang ^{[2
]}

Shen, Ao ^{[2
]}

Mao, Wei ^{[2
]}

Hashimoto, Masanori ^{[1
]}

Yu, Hao ^{[2
]}

机构：

[1] Kyoto Univ, Dept Commun & Comp Engn, Kyoto 6068501, Japan

[2] Southern Univ Sci & Technol, Sch Microelect, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2023年 / 70卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Convolutional neural networks; Adders; Engines; Power demand; Inference algorithms; Hardware; Feature extraction; Accelerator; CNN; MAC; radix-4; Booth; low-power; sparsity; EFFICIENT; PROCESSOR;

D O I：

10.1109/TCSII.2022.3231361

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Working on edging device, convolutional neural network (CNN) inference application demands low-power consumption and high-performance computation. Therefore, exploiting energy-efficient multiply-and-accumulate (MAC) unit and high-throughput sparse CNN accelerator is of great importance. In this brief, we develop a sparse CNN accelerator achieving a high MAC-unit utilization ratio and great power efficiency. The accelerator includes a radix-4 Booth multiplier for pre-encoding weights to reduce the number of partial products (PPs) and the encoder power consumption. The proposed accelerator has the following three features. Firstly, we reduce the bit number of PPs exploiting the features of radix-4 Booth algorithm and offline weight pre-processing. Secondly, we extract eight encoders from relevant multipliers and merge them into one pre-encoding module to reduce area. Finally, after encoding non-zero weights offline, we design an activation selector module to select the activations corresponding to non-zero weights for subsequent multiple-add operations. The proposed work is designed by Verilog HDL language and implemented in a 28nm process. The proposed accelerator achieves 7.0325 TOPS/W with 50% sparsity and scales with sparsity up to 14.3720 TOPS/W at 87.5%.

引用

页码：2246 / 2250

页数：5

共 20 条

[1] A Low Power Radix-4 Booth Multiplier With Pre-Encoded Mechanism [J].

Chang, Yen-Jen ;

Cheng, Yu-Cheng ;

Liao, Shao-Chi ;

Hsiao, Chun-Huo .

IEEE ACCESS, 2020, 8 :114842-114853

[2] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices [J].

Chen, Yu-Hsin ;

Yange, Tien-Ju ;

Emer, Joel S. ;

Sze, Vivienne .

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (02) :292-308

[3] A Modified Partial Product Generator for Redundant Binary Multipliers [J].

Cui, Xiaoping ;

Liu, Weiqiang ;

Chen, Xin ;

Swartzlander, Earl E., Jr. ;

Lombardi, Fabrizio .

IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (04) :1165-1171

[4]

Ercegovac M.D., 2003, DIGITAL ARITHMETIC

[5] Power Efficient Tiny Yolo CNN Using Reduced Hardware Resources Based on Booth Multiplier and WALLACE Tree Adders [J].

Farrukh, Fasih Ud Din ;

Zhang, Chun ;

Jiang, Yancao ;

Zhang, Zhonghan ;

Wang, Ziqiang ;

Wang, Zhihua ;

Jiang, Hanjun .

IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS, 2020, 1 :76-87

[6] Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks [J].

Gysel, Philipp ;

Pimentel, Jon ;

Motamedi, Mohammad ;

Ghiasi, Soheil .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (11) :5784-5789

[7] EIE: Efficient Inference Engine on Compressed Deep Neural Network [J].

Han, Song ;

Liu, Xingyu ;

Mao, Huizi ;

Pu, Jing ;

Pedram, Ardavan ;

Horowitz, Mark A. ;

Dally, William J. .

2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, :243-254

[8] Accelerator-Aware Pruning for Convolutional Neural Networks [J].

Kang, Hyeong-Ju .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) :2093-2103

[9] Approximate Hybrid High Radix Encoding for Energy-Efficient Inexact Multipliers [J].

Leon, Vasileios ;

Zervakis, Georgios ;

Soudris, Dimitrios ;

Pekmestzi, Kiamal .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (03) :421-430

[10] Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression [J].

Li, Yawei ;

Gu, Shuhang ;

Mayer, Christoph ;

Van Gool, Luc ;

Timofte, Radu .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8015-8024

← 1 2 →