Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs

被引：10

作者：

Wang, Deguang ^{[1
]}

Shen, Junzhong ^{[1
]}

Wen, Mei ^{[1
]}

Zhang, Chunyuan ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China

来源：

ELECTRONICS | 2019年 / 8卷 / 07期

关键词：

DCNN; FPGA; pruning; sparsity; acceleration; 2D; 3D; uniform architecture;

D O I：

10.3390/electronics8070803

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9x. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5x to 3.6x in latency.

引用

页数：13

共 49 条

[21] TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
Gao, Mingyu
Pu, Jing
Yang, Xuan
Horowitz, Mark
Kozyrakis, Christos
TWENTY-SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXII), 2017, : 751 - 764
[22] TETRIS: Scalable and efficient neural network acceleration with 3D memory
Gao M.
Pu J.
Yang X.
Horowitz M.
Kozyrakis C.
1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (52): : 751 - 764
[23] Verilog Implementation of Fully Pipelined And Multiplierless 2D DCT/IDCT JPEG Architecture
Teja, Ravi G.
Sruthi, R.
Tomar, Kavita Singh
Sivanantham, S.
Sivasankaran, K.
PROCEEDINGS OF 2015 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2015,
[24] Efficient binary 3D convolutional neural network and hardware accelerator
Li, Guoqing
Zhang, Meng
Zhang, Qianru
Lin, Zhijian
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (01) : 61 - 71
[25] A-U3D: A Unified 2D/3D CNN Accelerator on the Versal Platform for Disparity Estimation
Zhang, Tianyu
Li, Dong
Wang, Hong
Li, Yunzhi
Ma, Xiang
Luo, Wei
Wang, Yu
Huang, Yang
Li, Yi
Zhang, Yu
Yang, Xinlin
Jia, Xijie
Lin, Qiang
Tian, Lu
Jiang, Fan
Xie, Dongliang
Luo, Hong
Shan, Yi
2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 123 - 129
[26] Design and Implementation of 2D IDCT/IDST-Specific Accelerator on Heterogeneous Multicore Architecture
Pourabed, Mohammad Ali
Nouri, Sajjad
Nurmi, Jari
2018 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2018,
[27] Dynamically Reconfigurable Parallel Architecture Implementation of 2D Convolution for Image Processing over FPGA
Jahiruzzaman, Md.
Saha, Shumit
Hawlader, Md. Abul Khayum
2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION COMMUNICATION TECHNOLOGY (ICEEICT 2015), 2015,
[28] Efficient architecture for direct 8 × 8 2D DCT computations with earlier zigzag ordering
Anas Hatim
Said Belkouch
Abderrahim Benslimane
Moha M’Rabet Hassani
Tayeb Sadiki
Multimedia Tools and Applications, 2016, 75 : 6121 - 6141
[29] Panning Sorter: A Minimal-Size Architecture for Hardware Implementation of 2D Data Sorting Coprocessors
Pedroni, Volnei A.
Jasinski, Ricardo P.
Pedroni, Ricardo U.
PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS), 2010, : 919 - 922
[30] An Integrated FPGA Accelerator for Deep Learning-Based 2D/3D Path Planning
Sugiura, Keisuke
Matsutani, Hiroki
IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (06) : 1442 - 1456

← 1 2 3 4 5 →