Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引：0

作者：

Wang, Ziwei ^{[1
]}

Trefzer, Martin A. ^{[1
]}

Bale, Simon J. ^{[1
]}

Tyrrell, Andy M. ^{[1
]}

机构：

[1] Univ York, Dept Elect Engn, York, N Yorkshire, England

来源：

2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年

关键词：

FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;

D O I：

10.1109/recosoc48741.2019.9034956

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.

引用

页码：35 / 42

页数：8

共 50 条

[1] FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit
Cho, Mannhee
Kim, Youngmin
ELECTRONICS, 2021, 10 (22)
[2] Photonic Multiply-Accumulate Operations for Neural Networks
Nahmias, Mitchell A.
de Lima, Thomas Ferreira
Tait, Alexander N.
Peng, Hsuan-Tung
Shastri, Bhavin J.
Prucnal, Paul R.
IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2020, 26 (01)
[3] Efficient Hardware Implementation of Artificial Neural Networks Using Approximate Multiply-Accumulate Blocks
Nojehdeh, Mohammadreza Esmali
Aksoy, Levent
Altun, Mustafa
2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 96 - 101
[4] Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing
Garland, James
Gregg, David
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (03)
[5] An Approximate Multiply-Accumulate Unit with Low Power and Reduced Area
Yang, Tongxin
Sato, Toshinori
Ukezono, Tomoaki
2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 386 - 391
[6] A fast multiply-accumulate architecture
Grisamore, RT
Swartzlander, EE
ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS X, 2000, 4116 : 279 - 287
[7] A Posit Based Multiply-accumulate Unit with Small Quire Size for Deep Neural Networks
Nakahara Y.
Masuda Y.
Kiyama M.
Amagasaki M.
Iida M.
IPSJ Transactions on System LSI Design Methodology, 2022, 15 : 16 - 19
[8] Demonstration of Multiply-Accumulate Operation With 28 nm FeFET Crossbar Array
De, Sourav
Mueller, Franz
Laleni, Nellie
Lederer, Maximilian
Raffel, Yannick
Mojumder, Shaown
Vardar, Alptekin
Abdulazhanov, Sukhrob
Ali, Tarek
Duenkel, Stefan
Beyer, Sven
Seidel, Konrad
Kaempfe, Thomas
IEEE ELECTRON DEVICE LETTERS, 2022, 43 (12) : 2081 - 2084
[9] MAXelerator: FPGA Accelerator for Privacy Preserving Multiply-Accumulate (MAC) on Cloud Servers
Hussain, Siam U.
Rouhani, Bita Darvish
Ghasemzadeh, Mohammad
Koushanfar, Farinaz
2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,
[10] A Reconfigurable Fused Multiply-Accumulate For Miscellaneous Operators in Deep Neural Network
Lei, Lei
Chen, Zhiming
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,

← 1 2 3 4 5 →