Toward Optimal Softcore Carry-aware Approximate Multipliers on Xilinx FPGAs

被引：3

作者：

Awais, Muhammad ^{[1
]}

Zahir, Ali ^{[1
]}

Shah, Syed Ayaz Ali ^{[1
]}

Reviriego, Pedro ^{[2
]}

Ullah, Anees ^{[3
]}

Ullah, Nasim ^{[4
]}

Khan, Adam ^{[3
]}

Ali, Hazrat ^{[5
]}

机构：

[1] COM SATS Univ Islamabad, Dept Elect & Comp Engn, Islamabad, Pakistan

[2] Univ Politecn Madrid, Dept Telemat Syst Engn, Madrid, Spain

[3] Univ Engn & Technol, Dept Elect Engn, Peshawar, Pakistan

[4] Taif Univ KSA, Dept Elect Engn, Coll Engn, Taif, Saudi Arabia

[5] Hamad Bin Khalifa Univ, Coll Sci & Engn, Doha, Qatar

来源：

ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS | 2023年 / 22卷 / 04期

关键词：

Neural Network; RADIX-8 BOOTH MULTIPLIERS; LOW-POWER; DESIGN;

D O I：

10.1145/3564243

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Domain-specific accelerators for signal processing, image processing, and machine learning are increasingly being implemented on SRAM-based field-programmable gate arrays (FPGAs). Owing to the inherent error tolerance of such applications, approximate arithmetic operations, in particular, the design of approximate multipliers, have become an important research problem. Truncation of lower bits is a widely used approximation approach; however, analyzing and limiting the effects of carry-propagation due to this approximation has not been explored in detail yet. In this article, an optimized carry-aware approximate radix-4 Booth multiplier design is presented that leverages the built-in slice look-up tables (LUTs) and carry-chain resources in a novel configuration. The proposed multiplier simplifies the computation of the upper and lower bits and provides significant benefits in terms of FPGA resource usage (LUTs saving 38.5%-42.9%), Power Delay Product (PDP saving 49.4%-53%), performance metric (LUTs x critical path delay (CPD) x PDP saving 68.9%-73.1%) and errors (70% improvement in mean relative error distance) compared to the latest state-of-the-art designs. Therefore, the proposed designs are an attractive choice to implement multiplication on FPGA-based accelerators.

引用

页数：19

共 32 条

[1] Energy and area efficient imprecise compressors for approximate multiplication at nanoscale
Ahmadinejad, Mohammad
Moaiyeri, Mohammad Hossein
Sabetzadeh, Farnaz
[J]. AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2019, 110
[2] An Improved Logarithmic Multiplier for Energy-Efficient Neural Computing
Ansari, Mohammad Saeed
Cockburn, Bruce F.
Han, Jie
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (04) : 614 - 625
[3] Efficient Implementations of Reduced Precision Redundancy (RPR) Multiply and Accumulate (MAC)
Chen, Ke
Chen, Linbin
Reviriego, Pedro
Lombardi, Fabrizio
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (05) : 784 - 790
[4] Dadda L., 1965, ALTA FREQ, V34, P349
[5] Low-Power Approximate Unsigned Multipliers With Configurable Error Recovery
Jiang, Honglan
Liu, Cong
Lombardi, Fabrizio
Han, Jie
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2019, 66 (01) : 189 - 202
[6] Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation
Jiang, Honglan
Han, Jie
Qiao, Fei
Lombardi, Fabrizio
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (08) : 2638 - 2644
[7] Koren I., 2018, COMPUTER ARITHMETIC
[8] Kulkarni P., 2011, Proceedings of the 24th International Conference on VLSI Design: concurrently with the 10th International Conference on Embedded Systems Design, P346, DOI 10.1109/VLSID.2011.51
[9] Lin CH, 2013, 2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), P33, DOI 10.1109/ICCD.2013.6657022
[10] Liu C, 2014, DES AUT TEST EUROPE

← 1 2 3 4 →