Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引：0

作者：

Wang, Ziwei ^{[1
]}

Trefzer, Martin A. ^{[1
]}

Bale, Simon J. ^{[1
]}

Tyrrell, Andy M. ^{[1
]}

机构：

[1] Univ York, Dept Elect Engn, York, N Yorkshire, England

来源：

2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年

关键词：

FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;

D O I：

10.1109/recosoc48741.2019.9034956

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.

引用

页码：35 / 42

页数：8

共 50 条

[31] Monolithic 3D stacked multiply-accumulate units
Lee, Young Seo
Kim, Kyung Min
Lee, Ji Heon
Gong, Young-Ho
Kim, Seon Wook
Chung, Sung Woo
INTEGRATION-THE VLSI JOURNAL, 2021, 76 : 183 - 189
[32] BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs
Chen, Yuzong
Abdelfattah, Mohamed S.
2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 52 - 62
[33] New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference
Zhang, Hao
Chen, Dongdong
Ko, Seok-Bum
IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (01) : 26 - 38
[34] Graphene-based Photonic-Electronic Multiply-Accumulate Neurons
De Marinis, L.
Kincaid, P. S.
Contestabile, G.
Gupta, S.
Andriolli, N.
2023 INTERNATIONAL CONFERENCE ON PHOTONICS IN SWITCHING AND COMPUTING, PSC, 2023,
[35] Demonstration of multiply-accumulate unit for programmable band-pass ADC
Bunyk, PI
Herr, QP
Johnson, MW
IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, 2005, 15 (02) : 392 - 395
[36] FPGA-Based Implementation of Comb Filters Using Sequential Multiply-Accumulate Operations for Use in Binaural Hearing Aids
Kambalimath, Shankarayya G.
Pandey, Prem C.
Kulkarni, Pandurangarao N.
Mahant-Shetti, Shivaling S.
Hiremath, Sangamesh G.
2014 Annual IEEE India Conference (INDICON), 2014,
[37] Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications
Zhang, Hao
He, Jiongrui
Ko, Seok-Bum
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
[38] Architectural optimizations for a floating point multiply-accumulate unit in a graphics pipeline
Acken, KP
Irwin, MJ
Owens, RM
Garga, AK
INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS 1996, PROCEEDINGS, 1996, : 65 - 71
[39] Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations
Xie, Hongwei
Song, Yafei
Cai, Ling
Li, Mingyang
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 868 - 875
[40] Efficient Hardware Implementation of Convolution Layers Using Multiply-Accumulate Blocks
Nojehdeh, Mohammadreza Esmali
Parvin, Sajjad
Altun, Mustafa
2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 402 - 405

← 1 2 3 4 5 →