High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression

被引：20

作者：

Nakahara, Hiroki ^{[1
]}

Que, Zhiqiang ^{[2
]}

Luk, Wayne ^{[2
]}

机构：

[1] Tokyo Inst Technol, Tokyo, Japan

[2] Imperial Coll London, London, England

来源：

28TH IEEE INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) | 2020年

基金：

英国工程与自然科学研究理事会;

关键词：

D O I：

10.1109/FCCM48280.2020.00010

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The growing interest in using FPGAs to accelerate convolutional neural network (CNN) workloads is driving the deployment of FPGAs on cloud services such as Amazon AWS and Microsoft Azure. Such current cloud-based FPGAs have serious problems concerning data transfer bandwidth. In this paper, we compress a transfer image using customized JPEG coding and implement a customized image decoder architecture. We analyze the trade-off between data transfer speed-up and recognition accuracy drop. Based on this compression scheme, we design a high-throughput CNN inference engine. Almost all existing FPGA-based CNN accelerators are based with the same idea as their GPU counterparts, where operations from different network layers are mapped onto the same hardware units working in a multiplexed way. Our fully pipelined architecture maps all the network layers on-chip and transfers the computation from different layers to their unit with independent optimization. We apply two CNN optimization techniques to a residual network, one is a channel shift and point-wise approximation, and the other is a binary weight quantization. We implement the proposed CNN inference accelerator on the Xilinx Virtex UltraScale+ XCVU9P FPGA. Our system peak-performance achieves 2.41 TOPS. Our compressed JPEG image transfer only consumes 4% of the system resource, drops 0.3 points of accuracy and achieves 81,120 FPS which is 65.27 times faster than the conventional straightforward RGB data transfer. Thus, our proposed data transfer architecture is sufficient to increase system performance. As for the system throughput, our system is 3.84-34.41 times higher than existing FPGA implementations. Compared with the Xeon CPU, it achieves 138.38 times higher throughput, and it dissipates 1.2 times lower power, so its efficiency is 177.12 times better. Compared with the Tesla V100 GPU, it achieves 9.48 times higher throughput, dissipates 3.9 times lower power, and its efficiency is 37.52 times better. Thus, our parallel architecture on an FPGA provides superior throughput for the acceleration of a CNN.

引用

页码：1 / 9

页数：9

共 50 条

[1] FPNet: Customized Convolutional Neural Network for FPGA Platforms
Yang, Yang
Wang, Chao
Gong, Lei
Zhou, Xuehai
2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 399 - 402
[2] A Customized Convolutional Neural Network with Low Model Complexity for JPEG Steganalysis
Huang, Junwen
Ni, Jiangqun
Wan, Linhong
Yan, Jingwen
IH&MMSEC '19: PROCEEDINGS OF THE ACM WORKSHOP ON INFORMATION HIDING AND MULTIMEDIA SECURITY, 2019, : 198 - 203
[3] High-Throughput Multichannel Parallelized Diffraction Convolutional Neural Network Accelerator
Hu, Zibo
Li, Shurui
Schwartz, Russell L. T.
Solyanik-Gorgone, Maria
Miscuglio, Mario
Gupta, Puneet
Sorger, Volker J.
LASER & PHOTONICS REVIEWS, 2022, 16 (12)
[4] Double JPEG compression forensics based on a convolutional neural network
Wang Q.
Zhang R.
EURASIP Journal on Information Security, 2016 (1):
[5] Convolutional Neural Network for Image Compression with Application to JPEG Standard
Puchala, Dariusz
Stokfiszewski, Kamil
2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, : 361 - 361
[6] An FPGA-Based High-Throughput Keypoint Detection Accelerator Using Convolutional Neural Network for Mobile Robot Applications
Li, Jingyuan
Liu, Ye
Huang, Kun
Zhou, Liang
Chang, Liang
Zhou, Jun
2022 IEEE ASIA PACIFIC CONFERENCE ON POSTGRADUATE RESEARCH IN MICROELECTRONICS AND ELECTRONICS, PRIMEASIA, 2022, : 81 - 84
[7] An FPGA-Based High-Throughput Dataflow Accelerator for Lightweight Neural Network
Zhao, Zhiyuan
Li, Jixing
Chen, Gang
Jiang, Zhelong
Qiao, Ruixiu
Xu, Peng
Chen, Yihao
Lu, Huaxiang
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[8] An Automatic RTL Compiler for High-Throughput FPGA Implementation of Diverse Deep Convolutional Neural Networks
Ma, Yufei
Cao, Yu
Vrudhula, Sarma
Seo, Jae-sun
2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
[9] Energy-Efficient and High-Throughput FPGA-based Accelerator for Convolutional Neural Networks
Feng, Gan
Hu, Zuyi
Chen, Song
Wu, Feng
2016 13TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2016, : 624 - 626
[10] A HIGH-THROUGHPUT NEURAL NETWORK ACCELERATOR
Chen, Tianshi
Du, Zidong
Sun, Ninghui
Wang, Jia
Wu, Chengyong
Chen, Yunji
Temam, Olivier
IEEE MICRO, 2015, 35 (03) : 24 - 32

← 1 2 3 4 5 →