Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引:0
|
作者
Wang, Ziwei [1 ]
Trefzer, Martin A. [1 ]
Bale, Simon J. [1 ]
Tyrrell, Andy M. [1 ]
机构
[1] Univ York, Dept Elect Engn, York, N Yorkshire, England
来源
2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年
关键词
FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;
D O I
10.1109/recosoc48741.2019.9034956
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 50 条
  • [31] Monolithic 3D stacked multiply-accumulate units
    Lee, Young Seo
    Kim, Kyung Min
    Lee, Ji Heon
    Gong, Young-Ho
    Kim, Seon Wook
    Chung, Sung Woo
    INTEGRATION-THE VLSI JOURNAL, 2021, 76 : 183 - 189
  • [32] BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs
    Chen, Yuzong
    Abdelfattah, Mohamed S.
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 52 - 62
  • [33] New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference
    Zhang, Hao
    Chen, Dongdong
    Ko, Seok-Bum
    IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (01) : 26 - 38
  • [34] Graphene-based Photonic-Electronic Multiply-Accumulate Neurons
    De Marinis, L.
    Kincaid, P. S.
    Contestabile, G.
    Gupta, S.
    Andriolli, N.
    2023 INTERNATIONAL CONFERENCE ON PHOTONICS IN SWITCHING AND COMPUTING, PSC, 2023,
  • [35] Demonstration of multiply-accumulate unit for programmable band-pass ADC
    Bunyk, PI
    Herr, QP
    Johnson, MW
    IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, 2005, 15 (02) : 392 - 395
  • [36] FPGA-Based Implementation of Comb Filters Using Sequential Multiply-Accumulate Operations for Use in Binaural Hearing Aids
    Kambalimath, Shankarayya G.
    Pandey, Prem C.
    Kulkarni, Pandurangarao N.
    Mahant-Shetti, Shivaling S.
    Hiremath, Sangamesh G.
    2014 Annual IEEE India Conference (INDICON), 2014,
  • [37] Efficient Posit Multiply-Accumulate Unit Generator for Deep Learning Applications
    Zhang, Hao
    He, Jiongrui
    Ko, Seok-Bum
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [38] Architectural optimizations for a floating point multiply-accumulate unit in a graphics pipeline
    Acken, KP
    Irwin, MJ
    Owens, RM
    Garga, AK
    INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS 1996, PROCEEDINGS, 1996, : 65 - 71
  • [39] Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations
    Xie, Hongwei
    Song, Yafei
    Cai, Ling
    Li, Mingyang
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 868 - 875
  • [40] Efficient Hardware Implementation of Convolution Layers Using Multiply-Accumulate Blocks
    Nojehdeh, Mohammadreza Esmali
    Parvin, Sajjad
    Altun, Mustafa
    2021 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2021), 2021, : 402 - 405