Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs

被引:10
|
作者
Wang, Deguang [1 ]
Shen, Junzhong [1 ]
Wen, Mei [1 ]
Zhang, Chunyuan [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
关键词
DCNN; FPGA; pruning; sparsity; acceleration; 2D; 3D; uniform architecture;
D O I
10.3390/electronics8070803
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9x. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5x to 3.6x in latency.
引用
收藏
页数:13
相关论文
共 49 条
  • [21] TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
    Gao, Mingyu
    Pu, Jing
    Yang, Xuan
    Horowitz, Mark
    Kozyrakis, Christos
    TWENTY-SECOND INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS XXII), 2017, : 751 - 764
  • [22] TETRIS: Scalable and efficient neural network acceleration with 3D memory
    Gao M.
    Pu J.
    Yang X.
    Horowitz M.
    Kozyrakis C.
    1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (52): : 751 - 764
  • [23] Verilog Implementation of Fully Pipelined And Multiplierless 2D DCT/IDCT JPEG Architecture
    Teja, Ravi G.
    Sruthi, R.
    Tomar, Kavita Singh
    Sivanantham, S.
    Sivasankaran, K.
    PROCEEDINGS OF 2015 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2015,
  • [24] Efficient binary 3D convolutional neural network and hardware accelerator
    Li, Guoqing
    Zhang, Meng
    Zhang, Qianru
    Lin, Zhijian
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2022, 19 (01) : 61 - 71
  • [25] A-U3D: A Unified 2D/3D CNN Accelerator on the Versal Platform for Disparity Estimation
    Zhang, Tianyu
    Li, Dong
    Wang, Hong
    Li, Yunzhi
    Ma, Xiang
    Luo, Wei
    Wang, Yu
    Huang, Yang
    Li, Yi
    Zhang, Yu
    Yang, Xinlin
    Jia, Xijie
    Lin, Qiang
    Tian, Lu
    Jiang, Fan
    Xie, Dongliang
    Luo, Hong
    Shan, Yi
    2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 123 - 129
  • [26] Design and Implementation of 2D IDCT/IDST-Specific Accelerator on Heterogeneous Multicore Architecture
    Pourabed, Mohammad Ali
    Nouri, Sajjad
    Nurmi, Jari
    2018 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2018,
  • [27] Dynamically Reconfigurable Parallel Architecture Implementation of 2D Convolution for Image Processing over FPGA
    Jahiruzzaman, Md.
    Saha, Shumit
    Hawlader, Md. Abul Khayum
    2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION COMMUNICATION TECHNOLOGY (ICEEICT 2015), 2015,
  • [28] Efficient architecture for direct 8 × 8 2D DCT computations with earlier zigzag ordering
    Anas Hatim
    Said Belkouch
    Abderrahim Benslimane
    Moha M’Rabet Hassani
    Tayeb Sadiki
    Multimedia Tools and Applications, 2016, 75 : 6121 - 6141
  • [29] Panning Sorter: A Minimal-Size Architecture for Hardware Implementation of 2D Data Sorting Coprocessors
    Pedroni, Volnei A.
    Jasinski, Ricardo P.
    Pedroni, Ricardo U.
    PROCEEDINGS OF THE 2010 IEEE ASIA PACIFIC CONFERENCE ON CIRCUIT AND SYSTEM (APCCAS), 2010, : 919 - 922
  • [30] An Integrated FPGA Accelerator for Deep Learning-Based 2D/3D Path Planning
    Sugiura, Keisuke
    Matsutani, Hiroki
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (06) : 1442 - 1456