Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引：0

作者：

Wang, Ziwei ^{[1
]}

Trefzer, Martin A. ^{[1
]}

Bale, Simon J. ^{[1
]}

Tyrrell, Andy M. ^{[1
]}

机构：

[1] Univ York, Dept Elect Engn, York, N Yorkshire, England

来源：

2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年

关键词：

FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;

D O I：

10.1109/recosoc48741.2019.9034956

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.

引用

页码：35 / 42

页数：8

共 50 条

[21] New design of an RSFQ parallel multiply-accumulate unit
Kataeva, Irina
Engseth, Henrik
Kidiyarova-Shevchenko, Anna
SUPERCONDUCTOR SCIENCE & TECHNOLOGY, 2006, 19 (05): : S381 - S386
[22] DESIGN OF EFFICIENT MULTIPLY-ACCUMULATE BLOCK FOR PID CONTROLLERS
Priya, V.
Kavitha, V.
2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 322 - 325
[23] Design and Performance Analysis of Multiply-Accumulate (MAC) Unit
SaiKumar, Maroju
Kumar, D. Ashok
Samundiswary, P.
2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1084 - 1089
[24] Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic Computing
Lee, Yang Yang
Halim, Zaini Abdul
Ab Wahab, Mohd Nadhir
Almohamad, Tarik Adnan
IEEE ACCESS, 2025, 13 : 33874 - 33882
[25] Designing an ultra-high-speed multiply-accumulate structure
Kashfi, Fatemeh
Fakhraie, S. Mehdi
Safari, Saeed
MICROELECTRONICS JOURNAL, 2008, 39 (12) : 1476 - 1484
[26] Design and implementation of asynchronous parallel multiply-accumulate arithmetic architectures
Rao, VM
Nowrouzian, B
38TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 761 - 764
[27] LOW POWER ENERGY EFFICIENT PIPELINED MULTIPLY-ACCUMULATE ARCHITECTURE
Sakthivel, R.
Sravanthi, K.
Kittur, Harish M.
PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 226 - 231
[28] FPGA IMPLEMENTATION OF LOWPASS FIR FILTER USING SINGLE MULTIPLY-ACCUMULATE UNIT WITH DUAL-PORT RAM
Aljumaili, Amer kais
Hassan, Raaed f.
Hamza, Ekhlas k.
Humaidi, Amjad j.
JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2025, 20 (01): : 345 - 361
[29] Double Throughput Multiply-Accumulate Unit for FlexCore Processor Enhancements
Hoang, Tung Thanh
Sjalander, Magnus
Larsson-Edefors, Per
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2821 - 2827
[30] Multiply-accumulate architecture for a special class of optimal extension fields
Sanu, MO
Swartzlander, EE
16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 134 - 139

← 1 2 3 4 5 →