Quantization aware approximate multiplier and hardware accelerator for edge computing of deep learning applications

被引:7
作者
Reddy, K. Manikantta [1 ]
Vasantha, M. H. [1 ]
Kumar, Y. B. Nithin [1 ]
Gopal, Ch. Keshava [2 ]
Dwivedi, Devesh [1 ]
机构
[1] Natl Inst Technol Goa, Dept Elect & Commun Engn, Ponda 403401, Goa, India
[2] Xilinx India Technol Serv Pvt Ltd, Syst Integrat & Validat Grp, Hyderabad 500032, India
关键词
Approximate computing; Approximate multiplier; Hardware accelerator; Edge computing; Matrix multiplication; LOW-POWER; NEURAL-NETWORK; COMPRESSORS; DESIGN; ADDER;
D O I
10.1016/j.vlsi.2021.08.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Approximate computing has emerged as an efficient design methodology for improving the performance and power-efficiency of digital systems by allowing a negligible loss in the output accuracy. Dedicated hardware accelerators built using approximate circuits can solve power-performance trade-off in the computationally complex applications like deep learning. This paper proposes an approximate radix-4 Booth multiplier and hardware accelerator for deploying deep learning applications on power-restricted mobile/edge computing devices. The proposed accelerator uses approximate multiplier based parallel processing elements to accelerate the workloads. The proposed accelerator is tested with matrix-vector multiplication (MVM) and matrix-matrix multiplication (MMM) workloads on Zynq ZCU102 evaluation board. The experimental results show that the average power consumption of the proposed accelerator reduces by 34% and 40% for MVM and MMM respectively, as compared to the conventional multiply-accumulate unit that was used in the literature to implement similar workloads. Moreover, the proposed accelerator achieved an average performance of 5 GOP/s and 42.5 GOP/s for MVM and MMM respectively at 275 MHz, which are 14x and 5x respective improvements over the conventional design.
引用
收藏
页码:268 / 279
页数:12
相关论文
共 40 条
[31]   Design and analysis of multiplier using approximate 4-2 compressor [J].
Reddy, Karri Manikantta ;
Vasantha, M. H. ;
Kumar, Y. B. Nithin ;
Dwivedi, Devesh .
AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2019, 107 :89-97
[32]  
Reddy KM, 2015, 2015 19TH INTERNATIONAL SYMPOSIUM ON VLSI DESIGN AND TEST (VDAT)
[33]   Exploring heterogeneous scheduling for edge computing with CPU and FPGA MPSoCs [J].
Rodriguez, Andres ;
Navarro, Angeles ;
Asenjo, Rafael ;
Corbera, Francisco ;
Gran, Ruben ;
Suarez, Dario ;
Nunez-Yanez, Jose .
JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 98 :27-40
[34]  
Saxena A., 2008, NASA Ames Prognostics Data Repository, P1551
[35]   Design of a real-time face detection architecture for heterogeneous systems-on-chips [J].
Spagnolo, Fanny ;
Perri, Stefania ;
Corsonello, Pasquale .
INTEGRATION-THE VLSI JOURNAL, 2020, 74 :1-10
[36]   Design and Analysis of Area and Power Efficient Approximate Booth Multipliers [J].
Venkatachalam, Suganthi ;
Adams, Elizabeth ;
Lee, Hyuk Jae ;
Ko, Seok-Bum .
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (11) :1697-1703
[37]  
Wang ZW, 2019, 2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019), P35, DOI [10.1109/recosoc48741.2019.9034956, 10.1109/ReCoSoC48741.2019.9034956]
[38]  
Wu YL, 2018, IEEE INT WORK SIGN P, P61
[39]  
Xilinx, 2017, AXI INT V2 1 LOG COR, P46
[40]   A Simple Yet Efficient Accuracy-Configurable Adder Design [J].
Xu, Wenbin ;
Sapatnekar, Sachin S. ;
Hu, Jiang .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (06) :1112-1125