FPGA Design of Transposed Convolutions for Deep Learning Using High-Level Synthesis

被引:1
作者
Sestito, Cristian [1 ,2 ]
Perri, Stefania [3 ]
Stewart, Robert [4 ]
机构
[1] Univ Calabria, Dept Informat Modeling Elect & Syst Engn, I-87036 Arcavacata Di Rende, Italy
[2] Univ Edinburgh, Ctr Elect Frontiers, Sch Engn, Edinburgh EH9 3BF, Scotland
[3] Univ Calabria, Dept Mech Energy & Management Engn, I-87036 Arcavacata Di Rende, Italy
[4] Heriot Watt Univ, Dept Comp Sci, Edinburgh EH14 4AS, Scotland
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2023年 / 95卷 / 10期
关键词
Transposed Convolution; Deep Learning; FPGA; High-Level Synthesis; Quantization; Parallelism; NETWORK; ARCHITECTURE;
D O I
10.1007/s11265-023-01883-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Learning (DL) is pervasive across a wide variety of domains. Convolutional Neural Networks (CNNs) are often used for image processing DL applications. Modern CNN models are growing to meet the needs of more sophisticated tasks, e.g. using Transposed Convolutions (TCONVs) for image decompression and image generation. Such state-of-the-art DL models often target GPU-based high-performance architectures, due to the high computational and hardware resource needs of TCONV layers. To avoid prohibitive GPU energy costs, CNNs are increasingly deployed to decentralized embedded autonomous devices, such as Field Programmable Gate Arrays (FPGAs). However, this poses challenges for designing efficient hardware implementations of TCONV layers. This paper presents a parameterized design and implementation of a new TCONV module, which is synthesizable onto FPGAs. It is implemented using the High-Level Synthesis (HLS), through a C++ template to parameterize its functional and non-functional properties. These parameters allow kernel sizes, image sizes, quantization and parallelism to be varied by users. With a systematic exploration in this design space, we find an optimal instance of this TCONV module that achieves 6.25 Giga Outputs per Second (Gout/s) using just 1.53 W of power. We then use our TCONV layer in two neural networks for image decompression and image generation. Image decompression achieves a speed throughput of more than 30K frames-per-second (fps) using only the 16% of resources on average, image generation achieves an energy efficiency of 324 fps/W and outperforms comparable state-of-the-art models by at least 7.3x.
引用
收藏
页码:1245 / 1263
页数:19
相关论文
共 37 条
[1]  
AMD Xilinx, 2020, VIV DES SUIT US GUID
[2]  
[Anonymous], 2017, arXiv
[3]  
ARM, 2012, AMBA 4 AXI4 AXI4 LIT
[4]   FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks [J].
Blott, Michaela ;
Preusser, Thomas B. ;
Fraser, Nicholas J. ;
Gambardella, Giulio ;
O'Brien, Kenneth ;
Umuroglu, Yaman ;
Leeser, Miriam ;
Vissers, Kees .
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2018, 11 (03)
[5]   An Energy-Efficient FPGA-Based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution [J].
Chang, Jung-Woo ;
Kang, Keon-Woo ;
Kang, Suk-Ju .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (01) :281-295
[6]   Generative Adversarial Networks An overview [J].
Creswell, Antonia ;
White, Tom ;
Dumoulin, Vincent ;
Arulkumaran, Kai ;
Sengupta, Biswa ;
Bharath, Anil A. .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :53-65
[7]   Exploring Efficient Acceleration Architecture for Winograd-Transformed Transposed Convolution of GANs on FPGAs [J].
Di, Xinkai ;
Yang, Hai-Gang ;
Jia, Yiping ;
Huang, Zhihong ;
Mao, Ning .
ELECTRONICS, 2020, 9 (02)
[8]   Accelerating the Super-Resolution Convolutional Neural Network [J].
Dong, Chao ;
Loy, Chen Change ;
Tang, Xiaoou .
COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :391-407
[9]   CE-Net: Context Encoder Network for 2D Medical Image Segmentation [J].
Gu, Zaiwang ;
Cheng, Jun ;
Fu, Huazhu ;
Zhou, Kang ;
Hao, Huaying ;
Zhao, Yitian ;
Zhang, Tianyang ;
Gao, Shenghua ;
Liu, Jiang .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2019, 38 (10) :2281-2292
[10]  
Hara K., 2015, 2015 international joint conference on neural networks (IJCNN), P1, DOI [DOI 10.1109/IJCNN.2015.7280578, 10.1109/ijcnn.2015.7280578]