Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

被引：7

作者：

Reddy, K. Manikantta ^{[1
]}

Vasantha, M. H. ^{[1
]}

Kumar, Y. B. Nithin ^{[1
]}

Gopal, Ch. Keshava ^{[2
]}

Dwivedi, Devesh ^{[1
]}

机构：

[1] Natl Inst Technol Goa, Dept Elect & Commun Engn, Ponda 403401, Goa, India

[2] Xilinx India Technol Serv Pvt Ltd, Syst Integrat & Validat Grp, Hyderabad 500032, India

来源：

INTEGRATION-THE VLSI JOURNAL | 2021年 / 81卷

关键词：

Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication; LOW-POWER; NEURAL-NETWORK; COMPRESSORS; DESIGN; ADDER;

D O I：

10.1016/j.vlsi.2021.08.001

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.

引用

页码：268 / 279

页数：12

共 40 条

[1] Long short-term memory [J].

Hochreiter, S ;

Schmidhuber, J .

NEURAL COMPUTATION, 1997, 9 (08) :1735-1780

[2] Low-Power Approximate Multipliers Using Encoded Partial Products and Approximate Compressors [J].

Ansari, Mohammad Saeed ;

Jiang, Honglan ;

Cockburn, Bruce F. ;

Han, Jie .

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2018, 8 (03) :404-416

[3]

Azari E, 2019, IEEE INT CONF BIG DA, P4450, DOI [10.1109/bigdata47090.2019.9006030, 10.1109/BigData47090.2019.9006030]

[4] Approximate radix-8 Booth multiplier for low power and high speed applications [J].

Boro, Bipul ;

Reddy, K. Manikantta ;

Kumar, Y. B. Nithin ;

Vasantha, M. H. .

MICROELECTRONICS JOURNAL, 2020, 101

[5] Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits [J].

Chang, CH ;

Gu, JM ;

Zhang, MY .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2004, 51 (10) :1985-1997

[6] Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training [J].

Cheng, TaiYu ;

Masuda, Yukata ;

Chen, Jun ;

Yu, Jaehoon ;

Hashimoto, Masanori .

INTEGRATION-THE VLSI JOURNAL, 2020, 74 :19-31

[7]

Chung J., 2014, PREPRINT

[8] Designing efficient accelerator of depthwise separable convolutional neural network on FPGA [J].

Ding, Wei ;

Huang, Zeyu ;

Huang, Zunkai ;

Tian, Li ;

Wang, Hui ;

Feng, Songlin .

JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 :278-286

[9] Block-Based Carry Speculative Approximate Adder for Energy-Efficient Applications [J].

Ebrahimi-Azandaryani, Farhad ;

Akbari, Omid ;

Kamal, Mehdi ;

Afzali-Kusha, Ali ;

Pedram, Massoud .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (01) :137-141

[10] Approximate Multiplier Design Using Novel Dual-Stage 4:2 Compressors [J].

Edavoor, Pranose J. ;

Raveendran, Sithara ;

Rahulkar, Amol D. .

IEEE ACCESS, 2020, 8 :48337-48351

← 1 2 3 4 →