Approximate Multiply-Accumulate Array for Convolutional Neural Networks on FPGA

被引:0
|
作者
Wang, Ziwei [1 ]
Trefzer, Martin A. [1 ]
Bale, Simon J. [1 ]
Tyrrell, Andy M. [1 ]
机构
[1] Univ York, Dept Elect Engn, York, N Yorkshire, England
来源
2019 14TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC 2019) | 2019年
关键词
FPGA; Approximate Computing; Convolutional Neural Networks; Neural Network Accelerator;
D O I
10.1109/recosoc48741.2019.9034956
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Convolutional Neural Networks (CNNs) have been widely used in many computer applications. The growth in deep neural networks and machine learning applications has resulted in the state-of-the-art in CNN architectures becoming more and more complex. Millions of multiply-accumulate (MACC) operations are needed in this kind of processing. To deal with these massive computing requirements, accelerating CNNs on FPGAs has become a viable solution for balancing power efficiency and processing speed. In this paper, we propose an approximate high-speed implementation of the convolution stage of a CNN computing architecture, the Approximate Multiply-Accumulate Array. Compared with the traditional multiply-accumulate operation, this implementation converts multiplications into additions and systolic accumulate operations. A key feature is the logarithmic addition with iterative residual error reduction stages which, in principle, allows to trade off power, area and speed with accuracy through for specific data using different configurations. Here, we present experiments where we configure the approximate multiplier in different ways, changing number of iteration stages as well as the bit width of the data and investigate the impact on overall accuracy. In this paper we present initial experiments evaluating the architecture's error using random input data, and Sobel Edge detection is used to investigate the proposed architecture with regard to its use in image -processing CNNs. The experimental results show that the proposed approximate architecture is up to 10.7% faster than a competitive FPGA implementation of an exact multiplier when running the convolution kernel over a test image, and that residual errors after two iterations reach 1.6% for 8-bit inputs and 0.001% for 12-bit inputs on average, based on 10,000 random samples.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 50 条
  • [1] FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit
    Cho, Mannhee
    Kim, Youngmin
    ELECTRONICS, 2021, 10 (22)
  • [2] Photonic Multiply-Accumulate Operations for Neural Networks
    Nahmias, Mitchell A.
    de Lima, Thomas Ferreira
    Tait, Alexander N.
    Peng, Hsuan-Tung
    Shastri, Bhavin J.
    Prucnal, Paul R.
    IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, 2020, 26 (01)
  • [3] Efficient Hardware Implementation of Artificial Neural Networks Using Approximate Multiply-Accumulate Blocks
    Nojehdeh, Mohammadreza Esmali
    Aksoy, Levent
    Altun, Mustafa
    2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 96 - 101
  • [4] Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing
    Garland, James
    Gregg, David
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (03)
  • [5] An Approximate Multiply-Accumulate Unit with Low Power and Reduced Area
    Yang, Tongxin
    Sato, Toshinori
    Ukezono, Tomoaki
    2019 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2019), 2019, : 386 - 391
  • [6] A fast multiply-accumulate architecture
    Grisamore, RT
    Swartzlander, EE
    ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS X, 2000, 4116 : 279 - 287
  • [7] A Posit Based Multiply-accumulate Unit with Small Quire Size for Deep Neural Networks
    Nakahara Y.
    Masuda Y.
    Kiyama M.
    Amagasaki M.
    Iida M.
    IPSJ Transactions on System LSI Design Methodology, 2022, 15 : 16 - 19
  • [8] Demonstration of Multiply-Accumulate Operation With 28 nm FeFET Crossbar Array
    De, Sourav
    Mueller, Franz
    Laleni, Nellie
    Lederer, Maximilian
    Raffel, Yannick
    Mojumder, Shaown
    Vardar, Alptekin
    Abdulazhanov, Sukhrob
    Ali, Tarek
    Duenkel, Stefan
    Beyer, Sven
    Seidel, Konrad
    Kaempfe, Thomas
    IEEE ELECTRON DEVICE LETTERS, 2022, 43 (12) : 2081 - 2084
  • [9] MAXelerator: FPGA Accelerator for Privacy Preserving Multiply-Accumulate (MAC) on Cloud Servers
    Hussain, Siam U.
    Rouhani, Bita Darvish
    Ghasemzadeh, Mohammad
    Koushanfar, Farinaz
    2018 55TH ACM/ESDA/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2018,
  • [10] A Reconfigurable Fused Multiply-Accumulate For Miscellaneous Operators in Deep Neural Network
    Lei, Lei
    Chen, Zhiming
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,