Optimization of FPGA-based CNN accelerators using metaheuristics

被引:3
|
作者
Sait, Sadiq M. [1 ,2 ]
El-Maleh, Aiman [1 ,2 ]
Altakrouri, Mohammad [1 ]
Shawahna, Ahmad [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Dept Comp Engn, Dhahran 31261, Saudi Arabia
[2] King Fahd Univ Petr & Minerals, Interdisciplinary Res Ctr Intelligent Secure Syst, Dhahran 31261, Saudi Arabia
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 04期
关键词
Convolutional neural network; FPGA; Metaheuristics; Simulated annealing; Tabu search; Combinatorial optimization; NP-hard problems; ALGORITHMS;
D O I
10.1007/s11227-022-04787-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, convolutional neural networks (CNNs) have demonstrated their ability to solve problems in many fields and with accuracy that was not possible before. However, this comes with extensive computational requirements, which made general central processing units (CPUs) unable to deliver the desired real-time performance. At the same time, field-programmable gate arrays (FPGAs) have seen a surge in interest for accelerating CNN inference. This is due to their ability to create custom designs with different levels of parallelism. Furthermore, FPGAs provide better performance per watt compared to other computing technologies such as graphics processing units (GPUs). The current trend in FPGA-based CNN accelerators is to implement multiple convolutional layer processors (CLPs), each of which is tailored for a subset of layers. However, the growing complexity of CNN architectures makes optimizing the resources available on the target FPGA device to deliver the optimal performance more challenging. This is because of the exponential increase in the design variables that must be considered when implementing a Multi-CLP accelerator as CNN's complexity increases. In this paper, we present a CNN accelerator and an accompanying automated design methodology that employs metaheuristics for partitioning available FPGA resources to design a Multi-CLP accelerator. Specifically, the proposed design tool adopts simulated annealing (SA) and tabu search (TS) algorithms to find the number of CLPs required and their respective configurations to achieve optimal performance on a given target FPGA device. Here, the focus is on the key specifications and hardware resources, including digital signal processors (DSPs), block random access memories (BRAMs), and off-chip memory bandwidth. Experimental results and comparisons using four well-known benchmark CNNs are presented demonstrating that the proposed acceleration framework is both encouraging and promising. The SA-/TS-based Multi-CLP achieves 1.31x - 2.37x higher throughput than the state-of-the-art Single-/Multi-CLP approaches in accelerating AlexNet, SqueezeNet 1.1, VGGNet, and GoogLeNet architectures on the Xilinx VC707 and VC709 FPGA boards.
引用
收藏
页码:4493 / 4533
页数:41
相关论文
共 50 条
  • [1] Optimization of FPGA-based CNN accelerators using metaheuristics
    Sadiq M. Sait
    Aiman El-Maleh
    Mohammad Altakrouri
    Ahmad Shawahna
    The Journal of Supercomputing, 2023, 79 : 4493 - 4533
  • [2] Energy Efficiency Optimization of FPGA-based CNN Accelerators with Full Data Reuse and VFS
    Jiang, Weixiong
    Yu, Heng
    Liu, Xinzhe
    Ha, Yajun
    2019 26TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS), 2019, : 446 - 449
  • [3] A Multi-Cache System for On-Chip Memory Optimization in FPGA-Based CNN Accelerators
    Pacini, Tommaso
    Rapuano, Emilio
    Dinelli, Gianmarco
    Fanucci, Luca
    ELECTRONICS, 2021, 10 (20)
  • [4] Increasing Flexibility of FPGA-based CNN Accelerators with Dynamic Partial Reconfiguration
    Irmak, Hasan
    Ziener, Daniel
    Alachiotis, Nikolaos
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 306 - 311
  • [5] Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators
    Gao, Zhen
    Qi, Yanmao
    Shi, Jinchang
    Liu, Qiang
    Ge, Guangjun
    Wang, Yu
    Reviriego, Pedro
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2025, 33 (01) : 66 - 74
  • [6] Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming
    Carballo-Hernandez, Walther
    Pelcat, Maxime
    Berry, Francois
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2023, 95 (10): : 1203 - 1218
  • [7] Automatic CNN Model Partitioning for GPU/FPGA-based Embedded Heterogeneous Accelerators using Geometric Programming
    Walther Carballo-Hernández
    Maxime Pelcat
    François Berry
    Journal of Signal Processing Systems, 2023, 95 : 1203 - 1218
  • [8] A Survey on FPGA-based Accelerators for CKKS
    Zhao, Wenpeng
    Chen, Qidong
    Wang, Yijie
    Zhang, Haichun
    Lu, Zhaojun
    Qu, Gang
    8TH INTERNATIONAL TEST CONFERENCE IN ASIA, ITC-ASIA 2024, 2024,
  • [9] A Collaborative Framework for FPGA-based CNN Design Modeling and Optimization
    Mu, Jiandong
    Zhang, Wei
    Liang, Hao
    Sinha, Sharad
    2018 28TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2018, : 139 - 146
  • [10] Computing Models for FPGA-Based Accelerators
    Herbordt, Martin C.
    Gu, Yongfeng
    VanCourt, Tom
    Model, Josh
    Sukhwani, Bharat
    Chiu, Matt
    COMPUTING IN SCIENCE & ENGINEERING, 2008, 10 (06) : 35 - 45