An Efficient FPGA-based Depthwise Separable Convolutional Neural Network Accelerator with Hardware Pruning

被引:4
|
作者
Liu, Zhengyan [1 ]
Liu, Qiang [1 ]
Yan, Shun [1 ]
Cheung, Ray C. C. [2 ]
机构
[1] Tianjin Univ, Sch Microelect, 92nd Rd, Tianjin 300072, Nankai, Peoples R China
[2] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
CNN accelerator; depthwise-seperable convolution; bottleneck; model compression;
D O I
10.1145/3615661
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Convolutional neural networks (CNNs) have been widely deployed in computer vision tasks. However, the computation and resource intensive characteristics of CNN bring obstacles to its application on embedded systems. This article proposes an efficient inference accelerator on Field Programmable Gate Array (FPGA) for CNNs with depthwise separable convolutions. To improve the accelerator efficiency, we make four contributions: (1) an efficient convolution engine with multiple strategies for exploiting parallelism and a configurable adder tree are designed to support three types of convolution operations; (2) a dedicated architecture combined with input buffers is designed for the bottleneck network structure to reduce data transmission time; (3) a hardware padding scheme to eliminate invalid padding operations is proposed; and (4) a hardware-assisted pruning method is developed to support online tradeoff between model accuracy and power consumption. Experimental results show that for MobileNetV2 the accelerator achieves 10x and 6x energy efficiency improvement over the CPU and GPU implementation, and 302.3 frames per second and 181.8 GOPS performance that is the best among several existing single-engine accelerators on FPGAs. The proposed hardware-assisted pruning method can effectively reduce 59.7% power consumption at the accuracy loss within 5%.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Designing efficient accelerator of depthwise separable convolutional neural network on FPGA
    Ding, Wei
    Huang, Zeyu
    Huang, Zunkai
    Tian, Li
    Wang, Hui
    Feng, Songlin
    JOURNAL OF SYSTEMS ARCHITECTURE, 2019, 97 : 278 - 286
  • [2] A hardware-efficient computing engine for FPGA-based deep convolutional neural network accelerator
    Li, Xueming
    Huang, Hongmin
    Chen, Taosheng
    Gao, Huaien
    Hu, Xianghong
    Xiong, Xiaoming
    MICROELECTRONICS JOURNAL, 2022, 128
  • [3] An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator
    Wu, Tsung-Hsi
    Shu, Chang
    Liu, Tsung-Te
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 5178 - 5186
  • [4] An FPGA-Based Computation-Efficient Convolutional Neural Network Accelerator
    Archana, V. S.
    2022 IEEE INTERNATIONAL POWER AND RENEWABLE ENERGY CONFERENCE, IPRECON, 2022,
  • [5] An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
    Liu, Bing
    Zou, Danyin
    Feng, Lei
    Feng, Shou
    Fu, Ping
    Li, Junbao
    ELECTRONICS, 2019, 8 (03)
  • [6] A FPGA-based Hardware Accelerator for Multiple Convolutional Neural Networks
    Yao, Yuchen
    Duan, Qinghua
    Zhang, Zhiqian
    Gao, Jiabao
    Wang, Jian
    Yang, Meng
    Tao, Xinxuan
    Lai, Jinmei
    2018 14TH IEEE INTERNATIONAL CONFERENCE ON SOLID-STATE AND INTEGRATED CIRCUIT TECHNOLOGY (ICSICT), 2018, : 1075 - 1077
  • [7] A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
    Huang, Jiye
    Liu, Xin
    Guo, Tongdong
    Zhao, Zhijin
    ELECTRONICS, 2023, 12 (07)
  • [8] An FPGA-Based Energy-Efficient Reconfigurable Depthwise Separable Convolution Accelerator for Image Recognition
    Xuan, Lei
    Un, Ka-Fai
    Lam, Chi-Seng
    Martins, Rui P.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (10) : 4003 - 4007
  • [9] An FPGA-based Accelerator Platform Implements for Convolutional Neural Network
    Meng, Xiao
    Yu, Lixin
    Qin, Zhiyong
    2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 25 - 28
  • [10] Efficient Accelerator for Depthwise Separable Convolutional Neural Networks Based on RISC-V
    Cao, Xi-Yu
    Chen, Xin
    Wei, Tong-Quan
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (11): : 2536 - 2551