FPGA-Based High-Speed Energy-Efficient 32-Bit Fixed-Point MAC Architecture for DSP Application in IoT Edge Computing

被引：0

作者：

Nagar, Mitul Sudhirkumar ^{[1
]}

Patel, Sohan H. ^{[1
]}

Engineer, Pinalkumar ^{[1
]}

机构：

[1] Sardar Vallabhbhai Natl Inst Technol, Dept Elect Engn, Surat 395007, India

来源：

JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS | 2024年 / 33卷 / 14期

关键词：

Digital signal processing (DSP); multiply-accumulate (MAC) unit; 32-bit fixed-point signed MAC; DSP48; processing element (PE); edge computing; MULTIPLIER;

D O I：

10.1142/S0218126624502505

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Designing high-speed and energy-efficient blocks for image and digital signal processing (DSP) architecture is an evolving research field. This work designs a high-speed and energy-efficient multiply-accumulate (MAC) unit to augment the performance of field-programmable gate array (FPGA)-based accelerators and softcore processors. In this work, three discrete 32-bit fixed-point signed MAC architectures were designed in Verilog and synthesized for the Zynq 7000 ZedBoard to obtain efficient MAC architecture. The ultimate goal of this work is to design a fast and energy-efficient MAC unit that can achieve speed up to the DSP48 block to reduce the latency of IoT edge computing. Energy efficiency was achieved in PPG and partial product addition (PPA) for the proposed Booth radix-4 Dadda (BR4D)-based MAC. At PPG, the width of the partial product (PP) terms was optimized with Bewick's signed extension to reduce the power consumption. At PPA, the number of PP rows reduces the critical path delay (CPD) with Dadda-based PPA. The proposed BR4D MAC unit offers a reduction in dynamic power, CPD, power-delay product (PDP) and energy-delay product (EDP) by 22%, 9%, 29% and 36%, respectively, compared to standard Booth radix-4 Wallace tree (BR4WT) based MAC. Furthermore, hybrid MACs (BR4WT and BR4D) were compared with the current state-of-the-art (SoA) designs, and it was found that the proposed BR4D MAC is 47% faster compared to the same design in SoA. The proposed BR4D was tested for frequency scaling technique by reducing the frequency in steps of 10 MHz from a maximum usable frequency (MUF) of 64 MHz to 10 MHz to evaluate the performance for low-power applications. Reducing clock frequency by 84% will reduce the power consumption at the same proportion and speed by 38%. Additionally, the proposed design helps to improve the battery life of IoT end nodes with a reduction in energy consumption and EDP by 76% and 61%, respectively.

引用

页数：25

共 16 条

[1] High-Speed Energy-Efficient Fixed-Point Signed Multipliers for FPGA-Based DSP Applications
Nagar, Mitul Sudhirkumar
Mathuriya, Aditya
Patel, Sohan H.
Engineer, Pinalkumar J.
IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (04) : 417 - 420
[2] Architecture design of a high-performance 32-bit fixed-point DSP
Chen, J
Xu, RH
Fu, YZ
ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, PROCEEDINGS, 2004, 3189 : 115 - 125
[3] Design of a high-speed FPGA-based 32-bit floating-point FFT processor
Mou, Shengmei
Yang, Xiaodong
SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 84 - +
[4] Design and FPGA-based Implementation of a High Performance 32-bit DSP Processor
Ferdous, Tasnim
2012 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2012, : 484 - 489
[5] Empowering edge devices: FPGA-based 16-bit fixed-point accelerator with SVD for CNN on 32-bit memory-limited systems
Yanamala, Rama Muni Reddy
Pullakandam, Muralidhar
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2024, 52 (09) : 4755 - 4782
[6] An extensible architecture of 32-bit ALU for high-speed computing in QCA technology
Nilesh Patidar
Namit Gupta
The Journal of Supercomputing, 2022, 78 : 19605 - 19627
[7] An extensible architecture of 32-bit ALU for high-speed computing in QCA technology
Patidar, Nilesh
Gupta, Namit
JOURNAL OF SUPERCOMPUTING, 2022, 78 (18): : 19605 - 19627
[8] High-speed, area-efficient FPGA-based floating-point multiplier
Aty, GA
Hussein, AI
Ashour, IS
Mones, M
ICM 2003: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, 2003, : 274 - 277
[9] Fixed-Point Analysis and FPGA Implementation of Deep Neural Network Based Equalizers for High-Speed PON
Kaneda, Noriaki
Chuang, Chun-Yen
Zhu, Ziyi
Mahadevan, Amitkumar
Farah, Bob
Bergman, Keren
Van Veen, Doutje
Houtsma, Vincent
JOURNAL OF LIGHTWAVE TECHNOLOGY, 2022, 40 (07) : 1972 - 1980
[10] An efficient design of FSM based 32-bit unsigned high-speed pipelined multiplier using Verilog HDL
Abdullah-Al-Kafi
Rahman, Atul
Mahjabeen, Bushra
Rahman, Mahmudur
2014 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2014, : 164 - 167

← 1 2 →