Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network

被引:142
|
作者
Zhang, Jialiang [1 ]
Li, Jing [1 ]
机构
[1] Univ Wisconsin, Dept Elect & Comp Engn, Madison, WI 53706 USA
关键词
D O I
10.1145/3020078.3021698
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the code portability and programmability of FPGA, it comes at the expense of performance. The key challenge is to optimize the OpenCL kernels to efficiently utilize the flexible hardware resources in FPGA. Simply optimizing the OpenCL kernel code through various compiler options turns out insufficient to achieve desirable performance for both compute-intensive and data-intensive workloads such as convolutional neural networks. In this paper, we first propose an analytical performance model and apply it to perform an in-depth analysis on the resource requirement of CNN classifier kernels and available resources on modern FPGAs. We identify that the key performance bottleneck is the on-chip memory bandwidth. We propose a new kernel design to effectively address such bandwidth limitation and to provide an optimal balance between computation, on-chip, and off-chip memory access. As a case study, we further apply these techniques to design a CNN accelerator based on the VGG model. Finally, we evaluate the performance of our CNN accelerator using an Altera Arria 10 GX1150 board. We achieve 866 Gop/s floating point performance at 370MHz working frequency and 1:79 Top/s 16-bit fixed-point performance at 385MHz. To the best of our knowledge, our implementation achieves the best power efficiency and performance density compared to existing work.
引用
收藏
页码:25 / 34
页数:10
相关论文
共 50 条
  • [21] An FPGA-based Accelerator Platform Implements for Convolutional Neural Network
    Meng, Xiao
    Yu, Lixin
    Qin, Zhiyong
    2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 25 - 28
  • [22] In-FPGA Instrumentation Framework for OpenCL-Based Designs
    Bensalem, Hachem
    Blaquiere, Yves
    Savaria, Yvon
    IEEE ACCESS, 2020, 8 (08): : 212979 - 212994
  • [23] Evaluation of an OpenCL-Based FPGA Platform for Particle Filter
    Tatsumi, Shunsuke
    Hariyama, Masanori
    Ikoma, Norikazu
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2016, 20 (05) : 743 - 754
  • [24] Performance Analysis of Adaptive Resource Allocation Scheme for OpenCL-based FPGA Virtualization System
    Le, Duc-Canh
    Oh, Eun-Young
    Cho, Gyu-Sang
    Lee, Kyung-Chae
    Kim, Sung-Hyun
    Youn, Chan-Hyun
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 392 - 397
  • [25] Heterogeneous FPGA Based Convolutional Network Accelerator
    Zhou X.
    Zhong S.
    Zhang W.
    Wang J.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (10): : 927 - 935
  • [26] FPGA-based Accelerator for Convolutional Neural Network Application in Mobile Robotics
    Mazzetto, Lucas F. R.
    Castanho, Jose E. C.
    2023 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS, 2023 BRAZILIAN SYMPOSIUM ON ROBOTICS, SBR, AND 2023 WORKSHOP ON ROBOTICS IN EDUCATION, WRE, 2023, : 433 - 438
  • [27] A FPGA-based Accelerator of Convolutional Neural Network for Face Feature Extraction
    Ding, Ru
    Su, Guangda
    Bai, Guoqiang
    Xu, Wei
    Su, Nan
    Wu, Xingjun
    2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRON DEVICES AND SOLID-STATE CIRCUITS (EDSSC), 2019,
  • [28] FPGA-Based Unified Accelerator for Convolutional Neural Network and Vision Transformer
    Li T.
    Zhang F.
    Wang S.
    Cao W.
    Chen L.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (06): : 2663 - 2672
  • [29] FPGA-based Training Accelerator Utilizing Sparseness of Convolutional Neural Network
    Nakahara, Hiroki
    Sada, Youki
    Shimoda, Masayuki
    Sayama, Kouki
    Jinguji, Akira
    Sato, Shimpei
    2019 29TH INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2019, : 180 - 186
  • [30] An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator
    Wu, Tsung-Hsi
    Shu, Chang
    Liu, Tsung-Te
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 5178 - 5186