An FPGA-Based Approach for Compressing and Accelerating Depthwise Separable Convolution

被引:0
|
作者
Yang, Ruiheng [1 ]
Chen, Zhikun [1 ]
Hu, Lingtong [1 ]
Cui, Xihang [1 ]
Guo, Yunfei [1 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automation, Sch Artificial Intelligence, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Convolution; Optimization; Throughput; Resource management; Quantization (signal); Parallel processing; Hardware acceleration; CLIP-Q; DSC; FPGA; hardware accelerator; CNN;
D O I
10.1109/LSP.2024.3425286
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The rapid progress of deep learning has led to an increase in the parameter count and computational requirements of convolutional neural networks (CNN), presenting difficulties in deploying networks on hardware platforms with constrained resources. Although depthwise separable convolution (DSC) is one method used to tackle this issue, it still maintains numerous redundant parameters. Meanwhile, compression learning by in parallel pruning-quantization (CLIP-Q) method represents an efficient approach to network compression. However, it does not have additional optimization for DSC. This study proposes a method named DSC-CLIP-Q, which is derived from the CLIP-Q approach and is designed to specifically address the parameter distribution characteristics of DSC. Furthermore, the research developed a highly energy-efficient and reconfigurable hardware accelerator specifically designed for this approach. Additional storage optimizations tailored to the hardware features of DSC-CLIP-Q is introduced, in conjunction with a reconfigurable processing element (PE) array specifically designed for the convolutional characteristics of DSC. The experimental results indicate that the suggested DSC accelerator attains a high level of throughput and energy efficiency, while also enhancing network accuracy.
引用
收藏
页码:2590 / 2594
页数:5
相关论文
共 50 条
  • [21] Segmentation of retinal image vessels based on fully convolutional network with depthwise separable convolution and channel weighting
    Geng, Lei
    Qiu, Ling
    Wu, Jun
    Xiao, Zhitao
    Zhang, Fang
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2019, 36 (01): : 107 - 115
  • [22] Advanced-ExtremeNet: Combined with Depthwise Separable Convolution for the Detection of Steel Bars
    Pang, Shuyang
    Liu, Xuan
    Mao, Shangwei
    Jia, Hongsheng
    Liu, Bin
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [23] Large Model for Rotating Machine Fault Diagnosis Based on a Dense Connection Network With Depthwise Separable Convolution
    Qin, Yi
    Zhang, Taisheng
    Qian, Quan
    Mao, Yongfang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 12
  • [24] A More Scalable Deep-learning Processing Unit For Depthwise Separable Convolution
    Wang, Xiaofeng
    Ge, Yifan
    Gao, Yang
    Zhou, Hui
    Wu, Min
    Li, Chaoran
    2021 THE 6TH INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUITS AND MICROSYSTEMS (ICICM 2021), 2021, : 285 - 290
  • [25] Accelerating the Next Generation Long Read Mapping with the FPGA-Based System
    Chen, Peng
    Wang, Chao
    Li, Xi
    Zhou, Xuehai
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (05) : 840 - 852
  • [26] Dynamic Memory Access Control for Accelerating FPGA-based Image Processing
    Nishiguchi, Kenta
    Inoue, Toshiyuki
    Yamazaki, Rei
    Ogohara, Kazunori
    Tsuchiya, Akira
    Kishine, Keiji
    JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2021, 21 (01) : 29 - 38
  • [27] FPGA-Based Acceleration of Homomorphic Convolution with Plaintext Kernels Extended Abstract
    Ninan, Rohith George
    Kala, S.
    SECURITY, PRIVACY, AND APPLIED CRYPTOGRAPHY ENGINEERING, SPACE 2024, 2025, 15351 : 221 - 224
  • [28] FPGA-Based Implementation of an Event-Driven Spiking Multi-Kernel Convolution Architecture
    Zhang, Jian
    Feng, Lichen
    Wang, Tengbo
    Shi, Wei
    Wang, Yuechao
    Zhang, Guohe
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (03) : 1682 - 1686
  • [29] An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator
    Wu, Tsung-Hsi
    Shu, Chang
    Liu, Tsung-Te
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 5178 - 5186
  • [30] ShortcutFusion: From Tensorflow to FPGA-Based Accelerator With a Reuse-Aware Memory Allocation for Shortcut Data
    Nguyen, Duy Thanh
    Je, Hyeonseung
    Nguyen, Tuan Nghia
    Ryu, Soojung
    Lee, Kyujoong
    Lee, Hyuk-Jae
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (06) : 2477 - 2489