An FPGA-Based Approach for Compressing and Accelerating Depthwise Separable Convolution

被引：0

作者：

Yang, Ruiheng ^{[1
]}

Chen, Zhikun ^{[1
]}

Hu, Lingtong ^{[1
]}

Cui, Xihang ^{[1
]}

Guo, Yunfei ^{[1
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Automation, Sch Artificial Intelligence, Hangzhou 310018, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

中国国家自然科学基金;

关键词：

Convolution; Optimization; Throughput; Resource management; Quantization (signal); Parallel processing; Hardware acceleration; CLIP-Q; DSC; FPGA; hardware accelerator; CNN;

D O I：

10.1109/LSP.2024.3425286

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The rapid progress of deep learning has led to an increase in the parameter count and computational requirements of convolutional neural networks (CNN), presenting difficulties in deploying networks on hardware platforms with constrained resources. Although depthwise separable convolution (DSC) is one method used to tackle this issue, it still maintains numerous redundant parameters. Meanwhile, compression learning by in parallel pruning-quantization (CLIP-Q) method represents an efficient approach to network compression. However, it does not have additional optimization for DSC. This study proposes a method named DSC-CLIP-Q, which is derived from the CLIP-Q approach and is designed to specifically address the parameter distribution characteristics of DSC. Furthermore, the research developed a highly energy-efficient and reconfigurable hardware accelerator specifically designed for this approach. Additional storage optimizations tailored to the hardware features of DSC-CLIP-Q is introduced, in conjunction with a reconfigurable processing element (PE) array specifically designed for the convolutional characteristics of DSC. The experimental results indicate that the suggested DSC accelerator attains a high level of throughput and energy efficiency, while also enhancing network accuracy.

引用

页码：2590 / 2594

页数：5

共 50 条

[21] Segmentation of retinal image vessels based on fully convolutional network with depthwise separable convolution and channel weighting
Geng, Lei
Qiu, Ling
Wu, Jun
Xiao, Zhitao
Zhang, Fang
Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2019, 36 (01): : 107 - 115
[22] Advanced-ExtremeNet: Combined with Depthwise Separable Convolution for the Detection of Steel Bars
Pang, Shuyang
Liu, Xuan
Mao, Shangwei
Jia, Hongsheng
Liu, Bin
PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
[23] Large Model for Rotating Machine Fault Diagnosis Based on a Dense Connection Network With Depthwise Separable Convolution
Qin, Yi
Zhang, Taisheng
Qian, Quan
Mao, Yongfang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 12
[24] A More Scalable Deep-learning Processing Unit For Depthwise Separable Convolution
Wang, Xiaofeng
Ge, Yifan
Gao, Yang
Zhou, Hui
Wu, Min
Li, Chaoran
2021 THE 6TH INTERNATIONAL CONFERENCE ON INTEGRATED CIRCUITS AND MICROSYSTEMS (ICICM 2021), 2021, : 285 - 290
[25] Accelerating the Next Generation Long Read Mapping with the FPGA-Based System
Chen, Peng
Wang, Chao
Li, Xi
Zhou, Xuehai
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (05) : 840 - 852
[26] Dynamic Memory Access Control for Accelerating FPGA-based Image Processing
Nishiguchi, Kenta
Inoue, Toshiyuki
Yamazaki, Rei
Ogohara, Kazunori
Tsuchiya, Akira
Kishine, Keiji
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2021, 21 (01) : 29 - 38
[27] FPGA-Based Acceleration of Homomorphic Convolution with Plaintext Kernels Extended Abstract
Ninan, Rohith George
Kala, S.
SECURITY, PRIVACY, AND APPLIED CRYPTOGRAPHY ENGINEERING, SPACE 2024, 2025, 15351 : 221 - 224
[28] FPGA-Based Implementation of an Event-Driven Spiking Multi-Kernel Convolution Architecture
Zhang, Jian
Feng, Lichen
Wang, Tengbo
Shi, Wei
Wang, Yuechao
Zhang, Guohe
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (03) : 1682 - 1686
[29] An Efficient FPGA-Based Dilated and Transposed Convolutional Neural Network Accelerator
Wu, Tsung-Hsi
Shu, Chang
Liu, Tsung-Te
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (11) : 5178 - 5186
[30] ShortcutFusion: From Tensorflow to FPGA-Based Accelerator With a Reuse-Aware Memory Allocation for Shortcut Data
Nguyen, Duy Thanh
Je, Hyeonseung
Nguyen, Tuan Nghia
Ryu, Soojung
Lee, Kyujoong
Lee, Hyuk-Jae
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2022, 69 (06) : 2477 - 2489

← 1 2 3 4 5 →