Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

被引:7
作者
Reddy, K. Manikantta [1 ]
Vasantha, M. H. [1 ]
Kumar, Y. B. Nithin [1 ]
Gopal, Ch. Keshava [2 ]
Dwivedi, Devesh [1 ]
机构
[1] Natl Inst Technol Goa, Dept Elect & Commun Engn, Ponda 403401, Goa, India
[2] Xilinx India Technol Serv Pvt Ltd, Syst Integrat & Validat Grp, Hyderabad 500032, India
关键词
Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication; LOW-POWER; NEURAL-NETWORK; COMPRESSORS; DESIGN; ADDER;
D O I
10.1016/j.vlsi.2021.08.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.
引用
收藏
页码:268 / 279
页数:12
相关论文
共 40 条
[1]   Long short-term memory [J].
Hochreiter, S ;
Schmidhuber, J .
NEURAL COMPUTATION, 1997, 9 (08) :1735-1780
[2]   Low-Power Approximate Multipliers Using Encoded Partial Products and Approximate Compressors [J].
Ansari, Mohammad Saeed ;
Jiang, Honglan ;
Cockburn, Bruce F. ;
Han, Jie .
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2018, 8 (03) :404-416
[3]  
Azari E, 2019, IEEE INT CONF BIG DA, P4450, DOI [10.1109/bigdata47090.2019.9006030, 10.1109/BigData47090.2019.9006030]
[4]   Approximate radix-8 Booth multiplier for low power and high speed applications [J].
Boro, Bipul ;
Reddy, K. Manikantta ;
Kumar, Y. B. Nithin ;
Vasantha, M. H. .
MICROELECTRONICS JOURNAL, 2020, 101
[5]   Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits [J].
Chang, CH ;
Gu, JM ;
Zhang, MY .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2004, 51 (10) :1985-1997
[6]   Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training [J].
Cheng, TaiYu ;
Masuda, Yukata ;
Chen, Jun ;
Yu, Jaehoon ;
Hashimoto, Masanori .
INTEGRATION-THE VLSI JOURNAL, 2020, 74 :19-31
[7]  
Chung J., 2014, PREPRINT
[8]   Designing efficient accelerator of depthwise separable convolutional neural network on FPGA [J].
Ding, Wei ;
Huang, Zeyu ;
Huang, Zunkai ;
Tian, Li ;
Wang, Hui ;
Feng, Songlin .
JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 :278-286
[9]   Block-Based Carry Speculative Approximate Adder for Energy-Efficient Applications [J].
Ebrahimi-Azandaryani, Farhad ;
Akbari, Omid ;
Kamal, Mehdi ;
Afzali-Kusha, Ali ;
Pedram, Massoud .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2020, 67 (01) :137-141
[10]   Approximate Multiplier Design Using Novel Dual-Stage 4:2 Compressors [J].
Edavoor, Pranose J. ;
Raveendran, Sithara ;
Rahulkar, Amol D. .
IEEE ACCESS, 2020, 8 :48337-48351