Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引:0
|
作者
Wang, Ziwei [1 ]
Trefzer, Martin A. [1 ]
Bale, Simon J. [1 ]
Tyrrell, Andy M. [1 ]
机构
[1] Univ York, Dept Elect Engn, York, N Yorkshire, England
来源
2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年
关键词
FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;
D O I
10.1109/recosoc48741.2019.9034956
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 50 条
  • [41] Low-error configurable truncated multipliers for multiply-accumulate applications
    Kuang, S. -R.
    Wang, J. -P.
    ELECTRONICS LETTERS, 2006, 42 (16) : 904 - 905
  • [42] Flash based In-Memory Multiply-Accumulate Realisation: A Theoretical Study
    Balagopal, Ashwin S.
    Viraraghavan, Janakiraman
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [43] Comparison of architectures of a coarse-grain reconfigurable multiply-accumulate unit
    Bidhul, C. B.
    Hampannavar, Naveen
    Joseph, Sajeevan
    Jayakrishnan, P.
    Kumaravel, Sivasankaran
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 225 - 230
  • [44] A pipelined multiply-accumulate unit design for energy recovery DSP systems
    Suvakovic, D
    Salama, CAT
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL I: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 16 - 19
  • [45] Error Probability Models for Voltage-Scaled Multiply-Accumulate Units
    Rathore, Mallika
    Milder, Peter
    Salman, Emre
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (07) : 1665 - 1675
  • [46] Low-Complexity Precision-Scalable Multiply-Accumulate Unit Architectures for Deep Neural Network Accelerators
    Li, Wenjie
    Hu, Aokun
    Wang, Gang
    Xu, Ningyi
    He, Guanghui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (04) : 1610 - 1614
  • [47] Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
    Camusy, Vincent
    Meiy, Linyan
    Enz, Christian
    Verhelst, Marian
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2019, 9 (04) : 697 - 711
  • [48] Multiply-Accumulate Enhanced BDD-based Logic Synthesis on RRAM Crossbars
    Froehlich, Saman
    Shirinzadeh, Saeideh
    Drechsler, Rolf
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [49] A Method of Increasing Digital Filter Performance Based on Truncated Multiply-Accumulate Units
    Lyakhov, Pavel
    Valueva, Maria
    Valuev, Georgii
    Nagornov, Nikolai
    APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 11
  • [50] QuantMAC: Enhancing Hardware Performance in DNNs With Quantize Enabled Multiply-Accumulate Unit
    Ashar, Neha
    Raut, Gopal
    Trivedi, Vasundhara
    Vishvakarma, Santosh Kumar
    Kumar, Akash
    IEEE ACCESS, 2024, 12 : 43600 - 43614