High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression

被引:20
|
作者
Nakahara, Hiroki [1 ]
Que, Zhiqiang [2 ]
Luk, Wayne [2 ]
机构
[1] Tokyo Inst Technol, Tokyo, Japan
[2] Imperial Coll London, London, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/FCCM48280.2020.00010
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The growing interest in using FPGAs to accelerate convolutional neural network (CNN) workloads is driving the deployment of FPGAs on cloud services such as Amazon AWS and Microsoft Azure. Such current cloud-based FPGAs have serious problems concerning data transfer bandwidth. In this paper, we compress a transfer image using customized JPEG coding and implement a customized image decoder architecture. We analyze the trade-off between data transfer speed-up and recognition accuracy drop. Based on this compression scheme, we design a high-throughput CNN inference engine. Almost all existing FPGA-based CNN accelerators are based with the same idea as their GPU counterparts, where operations from different network layers are mapped onto the same hardware units working in a multiplexed way. Our fully pipelined architecture maps all the network layers on-chip and transfers the computation from different layers to their unit with independent optimization. We apply two CNN optimization techniques to a residual network, one is a channel shift and point-wise approximation, and the other is a binary weight quantization. We implement the proposed CNN inference accelerator on the Xilinx Virtex UltraScale+ XCVU9P FPGA. Our system peak-performance achieves 2.41 TOPS. Our compressed JPEG image transfer only consumes 4% of the system resource, drops 0.3 points of accuracy and achieves 81,120 FPS which is 65.27 times faster than the conventional straightforward RGB data transfer. Thus, our proposed data transfer architecture is sufficient to increase system performance. As for the system throughput, our system is 3.84-34.41 times higher than existing FPGA implementations. Compared with the Xeon CPU, it achieves 138.38 times higher throughput, and it dissipates 1.2 times lower power, so its efficiency is 177.12 times better. Compared with the Tesla V100 GPU, it achieves 9.48 times higher throughput, dissipates 3.9 times lower power, and its efficiency is 37.52 times better. Thus, our parallel architecture on an FPGA provides superior throughput for the acceleration of a CNN.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] FPNet: Customized Convolutional Neural Network for FPGA Platforms
    Yang, Yang
    Wang, Chao
    Gong, Lei
    Zhou, Xuehai
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 399 - 402
  • [2] A Customized Convolutional Neural Network with Low Model Complexity for JPEG Steganalysis
    Huang, Junwen
    Ni, Jiangqun
    Wan, Linhong
    Yan, Jingwen
    IH&MMSEC '19: PROCEEDINGS OF THE ACM WORKSHOP ON INFORMATION HIDING AND MULTIMEDIA SECURITY, 2019, : 198 - 203
  • [3] High-Throughput Multichannel Parallelized Diffraction Convolutional Neural Network Accelerator
    Hu, Zibo
    Li, Shurui
    Schwartz, Russell L. T.
    Solyanik-Gorgone, Maria
    Miscuglio, Mario
    Gupta, Puneet
    Sorger, Volker J.
    LASER & PHOTONICS REVIEWS, 2022, 16 (12)
  • [4] Double JPEG compression forensics based on a convolutional neural network
    Wang Q.
    Zhang R.
    EURASIP Journal on Information Security, 2016 (1):
  • [5] Convolutional Neural Network for Image Compression with Application to JPEG Standard
    Puchala, Dariusz
    Stokfiszewski, Kamil
    2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 361 - 361
  • [6] An FPGA-Based High-Throughput Keypoint Detection Accelerator Using Convolutional Neural Network for Mobile Robot Applications
    Li, Jingyuan
    Liu, Ye
    Huang, Kun
    Zhou, Liang
    Chang, Liang
    Zhou, Jun
    2022 IEEE ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIMEASIA, 2022, : 81 - 84
  • [7] An FPGA-Based High-Throughput Dataflow Accelerator for Lightweight Neural Network
    Zhao, Zhiyuan
    Li, Jixing
    Chen, Gang
    Jiang, Zhelong
    Qiao, Ruixiu
    Xu, Peng
    Chen, Yihao
    Lu, Huaxiang
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [8] An Automatic RTL Compiler for High-Throughput FPGA Implementation of Diverse Deep Convolutional Neural Networks
    Ma, Yufei
    Cao, Yu
    Vrudhula, Sarma
    Seo, Jae-sun
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [9] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
    Feng, Gan
    Hu, Zuyi
    Chen, Song
    Wu, Feng
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
  • [10] A HIGH-THROUGHPUT NEURAL NETWORK ACCELERATOR
    Chen, Tianshi
    Du, Zidong
    Sun, Ninghui
    Wang, Jia
    Wu, Chengyong
    Chen, Yunji
    Temam, Olivier
    IEEE MICRO, 2015, 35 (03) : 24 - 32