FPGA-Based High-Speed Energy-Efficient 32-Bit Fixed-Point MAC Architecture for DSP Application in IoT Edge Computing

被引:0
|
作者
Nagar, Mitul Sudhirkumar [1 ]
Patel, Sohan H. [1 ]
Engineer, Pinalkumar [1 ]
机构
[1] Sardar Vallabhbhai Natl Inst Technol, Dept Elect Engn, Surat 395007, India
关键词
Digital signal processing (DSP); multiply-accumulate (MAC) unit; 32-bit fixed-point signed MAC; DSP48; processing element (PE); edge computing; MULTIPLIER;
D O I
10.1142/S0218126624502505
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Designing high-speed and energy-efficient blocks for image and digital signal processing (DSP) architecture is an evolving research field. This work designs a high-speed and energy-efficient multiply-accumulate (MAC) unit to augment the performance of field-programmable gate array (FPGA)-based accelerators and softcore processors. In this work, three discrete 32-bit fixed-point signed MAC architectures were designed in Verilog and synthesized for the Zynq 7000 ZedBoard to obtain efficient MAC architecture. The ultimate goal of this work is to design a fast and energy-efficient MAC unit that can achieve speed up to the DSP48 block to reduce the latency of IoT edge computing. Energy efficiency was achieved in PPG and partial product addition (PPA) for the proposed Booth radix-4 Dadda (BR4D)-based MAC. At PPG, the width of the partial product (PP) terms was optimized with Bewick's signed extension to reduce the power consumption. At PPA, the number of PP rows reduces the critical path delay (CPD) with Dadda-based PPA. The proposed BR4D MAC unit offers a reduction in dynamic power, CPD, power-delay product (PDP) and energy-delay product (EDP) by 22%, 9%, 29% and 36%, respectively, compared to standard Booth radix-4 Wallace tree (BR4WT) based MAC. Furthermore, hybrid MACs (BR4WT and BR4D) were compared with the current state-of-the-art (SoA) designs, and it was found that the proposed BR4D MAC is 47% faster compared to the same design in SoA. The proposed BR4D was tested for frequency scaling technique by reducing the frequency in steps of 10 MHz from a maximum usable frequency (MUF) of 64 MHz to 10 MHz to evaluate the performance for low-power applications. Reducing clock frequency by 84% will reduce the power consumption at the same proportion and speed by 38%. Additionally, the proposed design helps to improve the battery life of IoT end nodes with a reduction in energy consumption and EDP by 76% and 61%, respectively.
引用
收藏
页数:25
相关论文
共 16 条
  • [1] High-Speed Energy-Efficient Fixed-Point Signed Multipliers for FPGA-Based DSP Applications
    Nagar, Mitul Sudhirkumar
    Mathuriya, Aditya
    Patel, Sohan H.
    Engineer, Pinalkumar J.
    IEEE EMBEDDED SYSTEMS LETTERS, 2024, 16 (04) : 417 - 420
  • [2] Architecture design of a high-performance 32-bit fixed-point DSP
    Chen, J
    Xu, RH
    Fu, YZ
    ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, PROCEEDINGS, 2004, 3189 : 115 - 125
  • [3] Design of a high-speed FPGA-based 32-bit floating-point FFT processor
    Mou, Shengmei
    Yang, Xiaodong
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 1, PROCEEDINGS, 2007, : 84 - +
  • [4] Design and FPGA-based Implementation of a High Performance 32-bit DSP Processor
    Ferdous, Tasnim
    2012 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2012, : 484 - 489
  • [5] Empowering edge devices: FPGA-based 16-bit fixed-point accelerator with SVD for CNN on 32-bit memory-limited systems
    Yanamala, Rama Muni Reddy
    Pullakandam, Muralidhar
    INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, 2024, 52 (09) : 4755 - 4782
  • [6] An extensible architecture of 32-bit ALU for high-speed computing in QCA technology
    Nilesh Patidar
    Namit Gupta
    The Journal of Supercomputing, 2022, 78 : 19605 - 19627
  • [7] An extensible architecture of 32-bit ALU for high-speed computing in QCA technology
    Patidar, Nilesh
    Gupta, Namit
    JOURNAL OF SUPERCOMPUTING, 2022, 78 (18): : 19605 - 19627
  • [8] High-speed, area-efficient FPGA-based floating-point multiplier
    Aty, GA
    Hussein, AI
    Ashour, IS
    Mones, M
    ICM 2003: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS, 2003, : 274 - 277
  • [9] Fixed-Point Analysis and FPGA Implementation of Deep Neural Network Based Equalizers for High-Speed PON
    Kaneda, Noriaki
    Chuang, Chun-Yen
    Zhu, Ziyi
    Mahadevan, Amitkumar
    Farah, Bob
    Bergman, Keren
    Van Veen, Doutje
    Houtsma, Vincent
    JOURNAL OF LIGHTWAVE TECHNOLOGY, 2022, 40 (07) : 1972 - 1980
  • [10] An efficient design of FSM based 32-bit unsigned high-speed pipelined multiplier using Verilog HDL
    Abdullah-Al-Kafi
    Rahman, Atul
    Mahjabeen, Bushra
    Rahman, Mahmudur
    2014 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2014, : 164 - 167