Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引：0

作者：

Wang, Ziwei ^{[1
]}

Trefzer, Martin A. ^{[1
]}

Bale, Simon J. ^{[1
]}

Tyrrell, Andy M. ^{[1
]}

机构：

[1] Univ York, Dept Elect Engn, York, N Yorkshire, England

来源：

2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年

关键词：

FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;

D O I：

10.1109/recosoc48741.2019.9034956

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.

引用

页码：35 / 42

页数：8

共 50 条

[41] Low-error configurable truncated multipliers for multiply-accumulate applications
Kuang, S. -R.
Wang, J. -P.
ELECTRONICS LETTERS, 2006, 42 (16) : 904 - 905
[42] Flash based In-Memory Multiply-Accumulate Realisation: A Theoretical Study
Balagopal, Ashwin S.
Viraraghavan, Janakiraman
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[43] Comparison of architectures of a coarse-grain reconfigurable multiply-accumulate unit
Bidhul, C. B.
Hampannavar, Naveen
Joseph, Sajeevan
Jayakrishnan, P.
Kumaravel, Sivasankaran
2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 225 - 230
[44] A pipelined multiply-accumulate unit design for energy recovery DSP systems
Suvakovic, D
Salama, CAT
ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL I: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 16 - 19
[45] Error Probability Models for Voltage-Scaled Multiply-Accumulate Units
Rathore, Mallika
Milder, Peter
Salman, Emre
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (07) : 1665 - 1675
[46] Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators
Li, Wenjie
Hu, Aokun
Wang, Gang
Xu, Ningyi
He, Guanghui
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (04) : 1610 - 1614
[47] Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
Camusy, Vincent
Meiy, Linyan
Enz, Christian
Verhelst, Marian
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 697 - 711
[48] Multiply-Accumulate Enhanced BDD-based Logic Synthesis on RRAM Crossbars
Froehlich, Saman
Shirinzadeh, Saeideh
Drechsler, Rolf
2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
[49] A Method of Increasing Digital Filter Performance Based on Truncated Multiply-Accumulate Units
Lyakhov, Pavel
Valueva, Maria
Valuev, Georgii
Nagornov, Nikolai
APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 11
[50] QuantMAC: Enhancing Hardware Performance in DNNs With Quantize Enabled Multiply-Accumulate Unit
Ashar, Neha
Raut, Gopal
Trivedi, Vasundhara
Vishvakarma, Santosh Kumar
Kumar, Akash
IEEE ACCESS, 2024, 12 : 43600 - 43614

← 1 2 3 4 5 →