BitMAC: Bit-Serial Computation-Based Efficient Multiply-Accumulate Unit for DNN Accelerator

被引：0

作者：

Harsh Chhajed

Gopal Raut

Narendra Dhakad

Sudheer Vishwakarma

Santosh Kumar Vishvakarma

机构：

[1] Indian Institute of Technology Indore,Department of Electrical Engineering

来源：

Circuits, Systems, and Signal Processing | 2022年 / 41卷

关键词：

ASIC; Bit-serial computing; DNN; MAC; Power-gating;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Contemporary hardware implementations of deep neural networks face the burden of excess area requirement due to resource-intensive elements such as a multiplier. A semi-custom ASIC approach-based VLSI circuit design of the multiply-accumulate unit in a deep neural network faces the chip area limitation. Therefore, an area and power-efficient architecture for the multiply-accumulate unit is imperative to down the burden of excess area requirement for digital design exploration. The present work addresses this challenge by proposing an efficient processing and bit-serial computation-based multiply-accumulate unit implementation. The proposed architecture is verified using simulation output and synthesized using Synopsys design vision at 180 nm and 45 nm technology and extracted all physical parameters using Cadence Virtuoso. At 45 nm, design shows 34.35% less area-delay-product (ADP). It shows improvement by 25.94% in area, 35.65% in power dissipation, and 14.30% in latency with respect to the state-of-the-art multiply-accumulate unit design. Furthermore, at lower technology node gets higher leakage power dissipation. In order to save leakage power, we exploit the power-gated design for the proposed architecture. The used coarse-grain power-gating technique saves 52.79% leakage/static power with minimal area overhead.

引用

页码：2045 / 2060

页数：15

共 31 条

[1]

Abed S(2018)High-performance low-power approximate Wallace tree multiplier Int. J. Circuit Theory Appl. 46 2334-2348

[2]

Khalil Y(2016)Hardware design and implementation of a novel ANN-based chaotic generator in FPGA Optik 127 5500-5505

[3]

Modhaffar M(2002)Power-constrained CMOS scaling limits IBM J. Res. Dev. 46 235-244

[4]

Ahmad I(2017)High performance Wallace tree multiplier using improved adder ICTACT J. Microelectron. 3 370-374

[5]

Alçın M(2018)Low-power approximate unsigned multipliers with configurable error recovery IEEE Trans. Circuits Syst. I Regul. Pap. 66 189-202

[6]

Pehlivan İ(2015)Deep neural nets as a method for quantitative structure-activity relationships J. Chem. Inf. Model. 55 263-274

[7]

Koyuncu İ(2019)Low-power modified shift-add multiplier design using parallel prefix adder Journal of Circuits, Systems and Computers 28 1950019-181

[8]

Frank DJ(2021)VLSI implementation of transcendental function hyperbolic tangent for deep neural network accelerators Microprocessors Microsyst. 84 104270-25497

[9]

Janveja M(2021)RECON: resource-efficient CORDIC-based neuron architecture IEEE Open J. Circuits Syst. 2 170-undefined

[10]

Niranjan V(2020)FPGA-based multi-level approximate multipliers for high-performance error-resilient applications IEEE Access 8 25481-undefined

← 1 2 3 4 →