Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

被引：7

作者：

Reddy, K. Manikantta ^{[1
]}

Vasantha, M. H. ^{[1
]}

Kumar, Y. B. Nithin ^{[1
]}

Gopal, Ch. Keshava ^{[2
]}

Dwivedi, Devesh ^{[1
]}

机构：

[1] Natl Inst Technol Goa, Dept Elect & Commun Engn, Ponda 403401, Goa, India

[2] Xilinx India Technol Serv Pvt Ltd, Syst Integrat & Validat Grp, Hyderabad 500032, India

来源：

INTEGRATION-THE VLSI JOURNAL | 2021年 / 81卷

关键词：

Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication; LOW-POWER; NEURAL-NETWORK; COMPRESSORS; DESIGN; ADDER;

D O I：

10.1016/j.vlsi.2021.08.001

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.

引用

页码：268 / 279

页数：12

共 40 条

[31] Design and analysis of multiplier using approximate 4-2 compressor [J].

Reddy, Karri Manikantta ;

Vasantha, M. H. ;

Kumar, Y. B. Nithin ;

Dwivedi, Devesh .

AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2019, 107 :89-97

[32]

Reddy KM, 2015, 2015 19TH INTERNATIONAL SYMPOSIUM ON VLSI DESIGN AND TEST (VDAT)

[33] Exploring heterogeneous scheduling for edge computing with CPU and FPGA MPSoCs [J].

Rodriguez, Andres ;

Navarro, Angeles ;

Asenjo, Rafael ;

Corbera, Francisco ;

Gran, Ruben ;

Suarez, Dario ;

Nunez-Yanez, Jose .

JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 98 :27-40

[34]

Saxena A., 2008, NASA Ames Prognostics Data Repository, P1551

[35] Design of a real-time face detection architecture for heterogeneous systems-on-chips [J].

Spagnolo, Fanny ;

Perri, Stefania ;

Corsonello, Pasquale .

INTEGRATION-THE VLSI JOURNAL, 2020, 74 :1-10

[36] Design and Analysis of Area and Power Efficient Approximate Booth Multipliers [J].

Venkatachalam, Suganthi ;

Adams, Elizabeth ;

Lee, Hyuk Jae ;

Ko, Seok-Bum .

IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) :1697-1703

[37]

Wang ZW, 2019, 2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019), P35, DOI [10.1109/recosoc48741.2019.9034956, 10.1109/ReCoSoC48741.2019.9034956]

[38]

Wu YL, 2018, IEEE INT WORK SIGN P, P61

[39]

Xilinx, 2017, AXI INT V2 1 LOG COR, P46

[40] A Simple Yet Efficient Accuracy-Configurable Adder Design [J].

Xu, Wenbin ;

Sapatnekar, Sachin S. ;

Hu, Jiang .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (06) :1112-1125

← 1 2 3 4 →