Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引:0
|
作者
Wang, Ziwei [1 ]
Trefzer, Martin A. [1 ]
Bale, Simon J. [1 ]
Tyrrell, Andy M. [1 ]
机构
[1] Univ York, Dept Elect Engn, York, N Yorkshire, England
来源
2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年
关键词
FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;
D O I
10.1109/recosoc48741.2019.9034956
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 50 条
  • [21] New design of an RSFQ parallel multiply-accumulate unit
    Kataeva, Irina
    Engseth, Henrik
    Kidiyarova-Shevchenko, Anna
    SUPERCONDUCTOR SCIENCE & TECHNOLOGY, 2006, 19 (05): : S381 - S386
  • [22] DESIGN OF EFFICIENT MULTIPLY-ACCUMULATE BLOCK FOR PID CONTROLLERS
    Priya, V.
    Kavitha, V.
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 322 - 325
  • [23] Design and Performance Analysis of Multiply-Accumulate (MAC) Unit
    SaiKumar, Maroju
    Kumar, D. Ashok
    Samundiswary, P.
    2014 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2014), 2014, : 1084 - 1089
  • [24] Toward Universal Multiplexer Multiply-Accumulate Architecture in Stochastic Computing
    Lee, Yang Yang
    Halim, Zaini Abdul
    Ab Wahab, Mohd Nadhir
    Almohamad, Tarik Adnan
    IEEE ACCESS, 2025, 13 : 33874 - 33882
  • [25] Designing an ultra-high-speed multiply-accumulate structure
    Kashfi, Fatemeh
    Fakhraie, S. Mehdi
    Safari, Saeed
    MICROELECTRONICS JOURNAL, 2008, 39 (12) : 1476 - 1484
  • [26] Design and implementation of asynchronous parallel multiply-accumulate arithmetic architectures
    Rao, VM
    Nowrouzian, B
    38TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 761 - 764
  • [27] LOW POWER ENERGY EFFICIENT PIPELINED MULTIPLY-ACCUMULATE ARCHITECTURE
    Sakthivel, R.
    Sravanthi, K.
    Kittur, Harish M.
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 226 - 231
  • [28] FPGA IMPLEMENTATION OF LOWPASS FIR FILTER USING SINGLE MULTIPLY-ACCUMULATE UNIT WITH DUAL-PORT RAM
    Aljumaili, Amer kais
    Hassan, Raaed f.
    Hamza, Ekhlas k.
    Humaidi, Amjad j.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2025, 20 (01): : 345 - 361
  • [29] Double Throughput Multiply-Accumulate Unit for FlexCore Processor Enhancements
    Hoang, Tung Thanh
    Sjalander, Magnus
    Larsson-Edefors, Per
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2821 - 2827
  • [30] Multiply-accumulate architecture for a special class of optimal extension fields
    Sanu, MO
    Swartzlander, EE
    16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 134 - 139